On 7/22/21 10:59 PM, Al Viro wrote: > On Tue, Apr 20, 2021 at 12:03:33PM +0100, Pavel Begunkov wrote: >> Just a bit of code tossing in io_sq_offload_create(), so it looks a bit >> better. No functional changes. > > Does a use-after-free count as a functional change? > >> f = fdget(p->wq_fd); > > Descriptor table is shared with another thread, grabbed a reference to file. > Refcount is 2 (1 from descriptor table, 1 held by us) > >> if (!f.file) >> return -ENXIO; > > Nope, not NULL. > >> - if (f.file->f_op != &io_uring_fops) { >> - fdput(f); >> - return -EINVAL; >> - } >> fdput(f); > > Decrement refcount, get preempted away. f.file->f_count is 1 now. > > Another thread: close() on the same descriptor. Final reference to > struct file (from descriptor table) is gone, file closed, memory freed. > > Regain CPU... > >> + if (f.file->f_op != &io_uring_fops) >> + return -EINVAL; > > ... and dereference an already freed structure. > > What scares me here is that you are playing with bloody fundamental objects, > without understanding even the basics regarding their handling ;-/ Yes, it's a stupid bug and slipped through accidentally, not proud of it. And it's obvious to anyone that it shouldn't be touched after a put, so have no clue why there is such a long explanation. Anyway, thanks for letting know. By luck, it should be of low severity, as it's a compatibility check, the result of which is not depended upon by any code after. To fault would need some RAM hot-remove (?). Not an excuse, how you put it, but useful to notice. > 1) descriptor tables can be shared. > 2) another thread can close file right under you. > 3) once all references to opened file are gone, it gets shut down and > struct file gets freed. > 4) inside an fdget()/fdput() pair you are guaranteed that (3) won't happen. > As soon as you've done fdput(), that promise is gone. > > In the above only (1) might have been non-obvious, because if you > accept _that_, you have to ask yourself what the fuck would prevent file > disappearing once you've done fdput(), seeing that it might be the last > thing your syscall is doing to the damn thing. So either that would've > leaked it, or _something_ in the operations you've done to it must've > made it possible for close(2) to get the damn thing. And dereferencing > ->f_op is unlikely to be that, isn't it? Which leaves fdput() the > only candidate. It's common sense stuff... > > Again, descriptor table is a shared resource and threads sharing > it can issue syscalls at the same time. Sure, I've got fewer excuses > than you do for lack of documentation, but that's really basic... > -- Pavel Begunkov