That looks like the difference between micro-ops and macro-ops. x86 instructions can include a load or store, which are broken up and scheduled separately.
Jonathan Kangsays:
The key is that the x86 instructions that are most commonly used by compilers are relatively simple instructions and those are the instructions x86 vendors optimize their designs for.
The rare instructions that aren’t used very often are handled by firmware or microcode anyway.
So in the end, real x86 programs are pretty RISC anyway.
foobarsays:
There are some cases where availability of barrel shifter logic integrated to other instructions on ARM can really shine on microbenchmarks (talking of highly optimised inner loops consisting of dozen instructions or less); at the same time, lack of some specific instructions such as parallel bits extract and deposit can provide a significant benefit on x86. I wonder how things are on vectored instruction sets.
foobarsays:
Eh, availability of such instructions on x86, of course – and lack of them on ARM (at least Apple for now).
Anonymoussays:
“ because ARM instructions are less powerful and do less work than x64 (Intel/AMD) instructions so that we have performance parity.”
Anyone who thinks this has not looked at ARM ISA in much detail and thinks aarch64 is classic RISC. It is not. If CISC is a state of being not RISC, then aarch64 is cisc. It has common instructions like load pair with autoincrement (updates 3GPRs. Such instructions are rare even in x86), alu operatons with shifts etc. There are some instructions in neon that have to be (sanely) implemented as a long sequence of ops.
There may be a case for this sort of argument against RISC-V, which sometimes needs 3instructions to do what one aarch64/x86 instruction does, like load with base+ scaled index+ displacement. Maybe it is an issue there.
That looks like the difference between micro-ops and macro-ops. x86 instructions can include a load or store, which are broken up and scheduled separately.
The key is that the x86 instructions that are most commonly used by compilers are relatively simple instructions and those are the instructions x86 vendors optimize their designs for.
The rare instructions that aren’t used very often are handled by firmware or microcode anyway.
So in the end, real x86 programs are pretty RISC anyway.
There are some cases where availability of barrel shifter logic integrated to other instructions on ARM can really shine on microbenchmarks (talking of highly optimised inner loops consisting of dozen instructions or less); at the same time, lack of some specific instructions such as parallel bits extract and deposit can provide a significant benefit on x86. I wonder how things are on vectored instruction sets.
Eh, availability of such instructions on x86, of course – and lack of them on ARM (at least Apple for now).
“ because ARM instructions are less powerful and do less work than x64 (Intel/AMD) instructions so that we have performance parity.”
Anyone who thinks this has not looked at ARM ISA in much detail and thinks aarch64 is classic RISC. It is not. If CISC is a state of being not RISC, then aarch64 is cisc. It has common instructions like load pair with autoincrement (updates 3GPRs. Such instructions are rare even in x86), alu operatons with shifts etc. There are some instructions in neon that have to be (sanely) implemented as a long sequence of ops.
There may be a case for this sort of argument against RISC-V, which sometimes needs 3instructions to do what one aarch64/x86 instruction does, like load with base+ scaled index+ displacement. Maybe it is an issue there.