On 12/3/18 2:10 AM, David Laight wrote: > From: Vineet Gupta > ... >>> It also seems to have used a different type of loop to the >>> other example, probably less efficient. >>> (Not that I'm an expert on ARC opcodes.) >> The difference is due to ISA and ensuing ARC gcc backends. ARCompact based cores >> don't support unaligned access and the loop there was ZOL (Zero delay loop). In >> ARCv2 based cores, the gcc backend has been tweaked to generate fewer ZOLs hence >> you see the more canonical tst and branch style loop. > Is this another case of the hardware implementing 'hardware' loop > instructions that execute slower than ones made of simple instructions? Not really. ZOL allow for hardware loops with no instruction/cycle overhead in general. However as micro-arches get more complicated there are newer "gizmos" added to the machinery which sometimes make it harder for the compliers to optimize for all the cases. ARCv2 ISA has a new DBNZ instruction (similar to x86 you refer below) to implement loops and that is preferred over the ZOL. > The worst example has to be the x86 'loop' (dec cx and jump nz) > instruction which is microcoded on intel cpus. > That makes it very difficult to use the new addx instruction to > get two dependency chains through a loop. _______________________________________________ linux-snps-arc mailing list linux-snps-arc@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/linux-snps-arc