On Fri, Feb 21, 2020 at 3:24 PM David Howells <dhowells@xxxxxxxxxx> wrote: > > Jann Horn <jannh@xxxxxxxxxx> wrote: > > > > + if (!s->s_watchers) { > > > > READ_ONCE() ? > > I'm not sure it matters. It can only be set once, and the next time we read > it we're inside the lock. And at this point, I don't actually dereference it, > and if it's non-NULL, it's not going to change. I'd really like these READ_ONCE() things to be *anywhere* the value can concurrently change, for two reasons: First, it tells the reader "keep in mind that this value may concurrently change in some way, don't just assume that it'll stay the same". But also, it tells the compiler that if it generates multiple loads here and assumes that they return the same value, *really* bad stuff may happen. GCC has some really fun behavior when compiling a switch() on a value that might change concurrently without using READ_ONCE(): It sometimes generates multiple loads, where the first load is used to test whether the value is in a specific range and then the second load is used for actually indexing into a table of jump destinations. If the value is concurrently mutated from an in-bounds value to an out-of-bounds value, this code will load a jump destination from random out-of-bounds memory. An example: $ cat gcc-jump.c int blah(int *x, int y) { switch (*x) { case 0: return y+1; case 1: return y*2; case 2: return y-3; case 3: return y^1; case 4: return y+6; case 5: return y-5; case 6: return y|1; case 7: return y&4; case 8: return y|5; case 9: return y-3; case 10: return y&8; case 11: return y|9; default: return y; } } $ gcc-9 -O2 -c -o gcc-jump.o gcc-jump.c $ objdump -dr gcc-jump.o [...] 0000000000000000 <blah>: 0: 83 3f 0b cmpl $0xb,(%rdi) 3: 0f 87 00 00 00 00 ja 9 <blah+0x9> 5: R_X86_64_PC32 .text.unlikely-0x4 9: 8b 07 mov (%rdi),%eax b: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx # 12 <blah+0x12> e: R_X86_64_PC32 .rodata-0x4 12: 48 63 04 82 movslq (%rdx,%rax,4),%rax 16: 48 01 d0 add %rdx,%rax 19: ff e0 jmpq *%rax [...] Or if you want to see a full example that actually crashes: $ cat gcc-jump-crash.c #include <pthread.h> int mutating_number; __attribute__((noinline)) int blah(int *x, int y) { switch (*x) { case 0: return y+1; case 1: return y*2; case 2: return y-3; case 3: return y^1; case 4: return y+6; case 5: return y-5; case 6: return y|1; case 7: return y&4; case 8: return y|5; case 9: return y-3; case 10: return y&8; case 11: return y|9; default: return y; } } int blah_num; void *thread_fn(void *dummy) { while (1) { blah_num = blah(&mutating_number, blah_num); } } int main(void) { pthread_t thread; pthread_create(&thread, NULL, thread_fn, NULL); while (1) { *(volatile int *)&mutating_number = 1; *(volatile int *)&mutating_number = 100000000; } } $ gcc-9 -O2 -pthread -o gcc-jump-crash gcc-jump-crash.c -ggdb -Wall $ gdb ./gcc-jump-crash [...] (gdb) run [...] Thread 2 "gcc-jump-crash" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff7db6700 (LWP 33237)] 0x00005555555551a2 in blah (x=0x555555558034 <mutating_number>, y=0) at gcc-jump-crash.c:6 6 switch (*x) { (gdb) x/10i blah 0x555555555190 <blah>: cmp DWORD PTR [rdi],0xb 0x555555555193 <blah+3>: ja 0x555555555050 <blah+4294966976> 0x555555555199 <blah+9>: mov eax,DWORD PTR [rdi] 0x55555555519b <blah+11>: lea rdx,[rip+0xe62] # 0x555555556004 => 0x5555555551a2 <blah+18>: movsxd rax,DWORD PTR [rdx+rax*4] 0x5555555551a6 <blah+22>: add rax,rdx 0x5555555551a9 <blah+25>: jmp rax 0x5555555551ab <blah+27>: nop DWORD PTR [rax+rax*1+0x0] 0x5555555551b0 <blah+32>: lea eax,[rsi-0x3] 0x5555555551b3 <blah+35>: ret (gdb) Here's a presentation from Felix Wilhelm, a security researcher who managed to find a case in the Xen hypervisor where a switch() on a value in shared memory was exploitable to compromise the hypervisor from inside a guest (see slides 35 and following): <https://www.blackhat.com/docs/us-16/materials/us-16-Wilhelm-Xenpwn-Breaking-Paravirtualized-Devices.pdf> I realize that a compiler is extremely unlikely to make such an optimization decision for a simple "if (!a->b)" branch; but still, I would prefer to have READ_ONCE() everywhere where it is semantically required, not just everywhere where you can think of a concrete compiler optimization that will break stuff. > > > + ret = add_watch_to_object(watch, s->s_watchers); > > > + if (ret == 0) { > > > + spin_lock(&sb_lock); > > > + s->s_count++; > > > + spin_unlock(&sb_lock); > > > > Where is the corresponding decrement of s->s_count? I'm guessing that > > it should be in the ->release_watch() handler, except that there isn't > > one... > > Um. Good question. I think this should do the job: > > static void sb_release_watch(struct watch *watch) > { > put_super(watch->private); > } > > And this then has to be set later: > > init_watch_list(wlist, sb_release_watch); (And as in the other case, the s->s_count increment will probably have to be moved above the add_watch_to_object(), unless you hold the sb_lock around it?)