On Wed, 18 Aug 2010, David Miller wrote: > From: Mikulas Patocka <mpatocka@xxxxxxxxxx> > Date: Wed, 18 Aug 2010 18:08:49 -0400 (EDT) > > > I don't think such microoptimizations can be measured. It may save an > > I-cacheline --- but who knows if exactly this cacheline makes some effect > > or not? > > These routines, when contention backoff is disabled, have > intentionally been coded to be perfectly 8 instructions, which is > exactly 32 bytes, which is exactly 1 I-cache line. You'll find that > much of the by-hand sparc64 assembler routines have been written to be > a multiple of 8 instructions. They are not: 0000000000000000 <atomic_add>: 0000000000000028 <atomic_sub>: 0000000000000050 <atomic_add_ret>: 000000000000007c <atomic_sub_ret>: 00000000000000a8 <atomic64_add>: 00000000000000d0 <atomic64_sub>: 00000000000000f8 <atomic64_add_ret>: 0000000000000124 <atomic64_sub_ret>: (on UP compile without backoff). That dummy backoff code produces jump forward and backward. > Because if you don't start a function on an I-cache line you get a > partial fetch when it's called, therefore making it impossible to fill > the pipeline even if the instructions could be executed in parallel. So add .align there. > So actually you're changes are likely to hurt performance from a cache > line and pipelining viewpoint. > > Furthermore, talking about saving one cycle (which I don't even think > you'll get) when the CAS instruction itself is going to stall the chip > for ~50 cycles is not all that worthwhile either. If it's like x86 --- i.e. flush the whole pipe and execute microcode, that it doesn't make much sense to optimize ticks in the pipeline. Optimizing for cache pollution could make some sense. Mikulas > The UltraSPARC-I,II,III et al. programming manuals are pretty clear > about code generation guidelines, I've been reading them for 10+ > years, and that is what I've used to guide the writing of the > assembler code. I've also run the code through simulators (when > possible) and done cycle analysis (both hot and cold cache cases) on > real hardware for these routines. > > So I basically expect the same kind of considerations from you if you > want to "optimize" this code :-) > > I value your contribution but seriously I think the code is fine and > optimal as-is. > -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html