On Wed, Sep 11, 2013 at 07:43:42PM -0700, Linus Torvalds wrote: > On Wed, Sep 11, 2013 at 7:20 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > I split the thing up into two macros GEN_UNARY_RMWcc and > > GEN_BINARY_RMWcc which ends up being more readable as well as smaller > > code overall. > > Yes, that looks like the right abstraction level. Powerful without > being complicated. > That said, the very same memory clobber may be what makes this whole > approach a loss, if it causes gcc to do tons of reloads or other > random things. Random things below, not sure why > For other cases? Who knows.. A lot of the "change and test atomically" > things have the same containment need, so it might not be a problem. Right, all atomic ops already contain a memory clobber to go along with their memory barrier semantics. So I did a defconfig build without the patch and with the patch and got a significant size increase: text data bss filename 11173443 1423736 1183744 defconfig-build/vmlinux.before 11174516 1423736 1183744 defconfig-build/vmlinux.after I then undid all the bitop conversions that added the extra memory clobber and got: 11174204 1423736 1183744 defconfig-build/vmlinux.after Which is still a significant increase, so I had a look at what GCC generates, for mm/rmap.c, which uses quite a few atomic_*_and_test(), functions I got: text data bss dec hex filename 8675 1 16 8692 21f4 defconfig-build/mm/rmap.o 8739 1 16 8756 2234 defconfig-build/mm/rmap.o So the increase is there too, doing a objdump -D on them the first difference is: 0000000000000660 <do_page_add_anon_rmap>: 660: 55 push %rbp 661: 48 89 e5 mov %rsp,%rbp 664: 48 83 ec 20 sub $0x20,%rsp 668: 48 89 5d f0 mov %rbx,-0x10(%rbp) 66c: 4c 89 65 f8 mov %r12,-0x8(%rbp) 670: 48 89 fb mov %rdi,%rbx 673: f0 ff 47 18 lock incl 0x18(%rdi) 677: 0f 94 c0 sete %al 67a: 84 c0 test %al,%al 67c: 75 12 jne 690 <do_page_add_anon_rmap+0x30> 67e: 48 8b 5d f0 mov -0x10(%rbp),%rbx 682: 4c 8b 65 f8 mov -0x8(%rbp),%r12 686: c9 leaveq vs.: 0000000000000660 <do_page_add_anon_rmap>: 660: 55 push %rbp 661: 48 89 e5 mov %rsp,%rbp 664: 48 83 ec 20 sub $0x20,%rsp 668: 48 89 5d e0 mov %rbx,-0x20(%rbp) 66c: 4c 89 65 e8 mov %r12,-0x18(%rbp) 670: 48 89 fb mov %rdi,%rbx 673: 4c 89 6d f0 mov %r13,-0x10(%rbp) 677: 4c 89 75 f8 mov %r14,-0x8(%rbp) 67b: f0 ff 47 18 lock incl 0x18(%rdi) 67f: 74 17 je 698 <do_page_add_anon_rmap+0x38> 681: 48 8b 5d e0 mov -0x20(%rbp),%rbx 685: 4c 8b 65 e8 mov -0x18(%rbp),%r12 689: 4c 8b 6d f0 mov -0x10(%rbp),%r13 68d: 4c 8b 75 f8 mov -0x8(%rbp),%r14 691: c9 leaveq For some obscure (to me) reason the new fangled asm goto construct generates a bunch of extra MOVs. OTOH the good news is that a kernel with that patch applied does indeed boot properly on real hardware. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html