2013-10-17 13:35-0400, Steven Rostedt: > On Thu, 17 Oct 2013 12:10:28 +0200 > Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: > > > We implemented the optimized branch selection in higher levels of api. > > That made static_keys very unintuitive, so this patch introduces another > > element to jump_table, carrying one bit that tells the underlying code > > which branch to optimize. > > > > It is now possible to select optimized branch for every jump_entry. > > > > Current side effect is 1/3 increase increase in space, we could: > > * use bitmasks and selectors on 2+ aligned code/struct. > > - aligning jump target is easy, but because it is not done by default > > and few bytes in .text are much worse that few kilos in .data, > > I chose not to > > - data is probably aligned by default on all current architectures, > > but programmer can force misalignment of static_key > > * optimize each architecture independently > > - I can't test everything and this patch shouldn't break anything, so > > others can contribute in the future > > * choose something worse, like packing or splitting > > * ignore > > > > proof: example & x86_64 disassembly: (F = ffffffff) > > > > struct static_key flexible_feature; > > noinline void jump_label_experiment(void) { > > if ( static_key_false(&flexible_feature)) > > asm ("push 0xa1"); > > else asm ("push 0xa0"); > > if (!static_key_false(&flexible_feature)) > > asm ("push 0xb0"); > > else asm ("push 0xb1"); > > if ( static_key_true(&flexible_feature)) > > asm ("push 0xc1"); > > else asm ("push 0xc0"); > > if (!static_key_true(&flexible_feature)) > > asm ("push 0xd0"); > > else asm ("push 0xd1"); > > } > > > > Disassembly of section .text: (push marked by "->") > > > > F81002000 <jump_label_experiment>: > > F81002000: e8 7b 29 75 00 callq F81754980 <__fentry__> > > F81002005: 55 push %rbp > > F81002006: 48 89 e5 mov %rsp,%rbp > > F81002009: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > F8100200e: -> ff 34 25 a0 00 00 00 pushq 0xa0 > > F81002015: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > F8100201a: -> ff 34 25 b0 00 00 00 pushq 0xb0 > > F81002021: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > F81002026: -> ff 34 25 c1 00 00 00 pushq 0xc1 > > F8100202d: 0f 1f 00 nopl (%rax) > > F81002030: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > F81002035: -> ff 34 25 d1 00 00 00 pushq 0xd1 > > F8100203c: 5d pop %rbp > > F8100203d: 0f 1f 00 nopl (%rax) > > F81002040: c3 retq > > This looks exactly like what we want. I take it this is with your > patch. What was the result before the patch? Yes, this is after the patch. The branches would (should) be the same without patch, but static_key_true() was defined as !static_key_false(), so this piece of code was invalid before, because half of them would be patched to use the wrong branch. > -- Steve > > > F81002041: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) > > F81002048: -> ff 34 25 d0 00 00 00 pushq 0xd0 > > F8100204f: 5d pop %rbp > > F81002050: c3 retq > > F81002051: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) > > F81002058: -> ff 34 25 c0 00 00 00 pushq 0xc0 > > F8100205f: 90 nop > > F81002060: eb cb jmp F8100202d <[...]+0x2d> > > F81002062: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > > F81002068: -> ff 34 25 b1 00 00 00 pushq 0xb1 > > F8100206f: 90 nop > > F81002070: eb af jmp F81002021 <[...]+0x21> > > F81002072: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > > F81002078: -> ff 34 25 a1 00 00 00 pushq 0xa1 > > F8100207f: 90 nop > > F81002080: eb 93 jmp F81002015 <[...]+0x15> > > F81002082: 66 66 66 66 66 2e 0f [...] > > F81002089: 1f 84 00 00 00 00 00 > > > > Contents of section .data: (relevant part of embedded __jump_table)