On Sat, Apr 17, 2021 at 04:48:53PM +0000, Al Viro wrote: > On Sat, Apr 17, 2021 at 07:36:39AM -0700, Alexei Starovoitov wrote: > > > The kernel will perform the same work with FDs. The same locks are held > > and the same execution conditions are in both cases. The LSM hooks, > > fsnotify, etc will be called the same way. > > It's no different if new syscall was introduced "sys_foo(int num)" that > > would do { return close_fd(num); }. > > It would opearate in the same user context. > > Hmm... unless I'm misreading the code, one of the call chains would seem to > be sys_bpf() -> bpf_prog_test_run() -> ->test_run() -> ... -> bpf_sys_close(). > OK, as long as you make sure bpf_prog_get() does fdput() (i.e. that we > don't have it restructured so that fdget()/fdput() pair would be lifted into > bpf_prog_test_run(), with fdput() moved in place of bpf_prog_put()). Got it. There is no fdget/put bracketing in the code. On the way to test_run we do __bpf_prog_get() which does fdget and immediately fdput after incrementing refcnt of the prog. I believe this pattern is consistent everywhere in kernel/bpf/* > Note that we *really* can not allow close_fd() on anything to be bracketed > by fdget()/fdput() pair; we had bugs of that sort and, as the matter of fact, > still have one in autofs_dev_ioctl(). > > The trouble happens if you have file F with 2 references, held by descriptor > tables of different processes. Say, process A has descriptor 6 refering to > it, while B has descriptor 42 doing the same. Descriptor tables of A and B > are not shared with anyone. > > A: fdget(6) -> returns a reference to F, refcount _not_ touched > A: close_fd(6) -> rips the reference to F from descriptor table, does fput(F) > refcount drops to 1. > B: close(42) -> rips the reference to F from B's descriptor table, does fput(F) > This time refcount does reach 0 and we use task_work_add() to > make sure the destructor (__fput()) runs before B returns to > userland. sys_close() returns and B goes off to userland. > On the way out __fput() is run, and among other things, > ->release() of F is executed, doing whatever it wants to do. > F is freed. > And at that point A, which presumably is using the guts of F, gets screwed. Thanks for these details. That's really helpful. > So please, mark all call sites with "make very sure you never get > here with unpaired fdget()". Good point. Will add this comment. > BTW, if my reading (re ->test_run()) is correct, what limits the recursion > via bpf_sys_bpf()? Glad you asked! This kind of code review questions are much appreciated. It's an allowlist of possible commands in bpf_sys_bpf(). 'case BPF_PROG_TEST_RUN:' is not there for this exact reason. I'll add a comment to make it more obvious.