Re: [PATCH bpf-next 11/15] bpf: Add bpf_sys_close() helper.

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sat, 17 Apr 2021 10:09:31 -0700

On Sat, Apr 17, 2021 at 04:48:53PM +0000, Al Viro wrote:
> On Sat, Apr 17, 2021 at 07:36:39AM -0700, Alexei Starovoitov wrote:
> 
> > The kernel will perform the same work with FDs. The same locks are held
> > and the same execution conditions are in both cases. The LSM hooks,
> > fsnotify, etc will be called the same way.
> > It's no different if new syscall was introduced "sys_foo(int num)" that
> > would do { return close_fd(num); }.
> > It would opearate in the same user context.
> 
> Hmm...  unless I'm misreading the code, one of the call chains would seem to
> be sys_bpf() -> bpf_prog_test_run() -> ->test_run() -> ... -> bpf_sys_close().
> OK, as long as you make sure bpf_prog_get() does fdput() (i.e. that we
> don't have it restructured so that fdget()/fdput() pair would be lifted into
> bpf_prog_test_run(), with fdput() moved in place of bpf_prog_put()).

Got it. There is no fdget/put bracketing in the code.
On the way to test_run we do __bpf_prog_get() which does fdget and immediately
fdput after incrementing refcnt of the prog.
I believe this pattern is consistent everywhere in kernel/bpf/*

> Note that we *really* can not allow close_fd() on anything to be bracketed
> by fdget()/fdput() pair; we had bugs of that sort and, as the matter of fact,
> still have one in autofs_dev_ioctl().
> 
> The trouble happens if you have file F with 2 references, held by descriptor
> tables of different processes.  Say, process A has descriptor 6 refering to
> it, while B has descriptor 42 doing the same.  Descriptor tables of A and B
> are not shared with anyone.
> 
> A: fdget(6) 	-> returns a reference to F, refcount _not_ touched
> A: close_fd(6)	-> rips the reference to F from descriptor table, does fput(F)
> 		   refcount drops to 1.
> B: close(42)	-> rips the reference to F from B's descriptor table, does fput(F)
> 		   This time refcount does reach 0 and we use task_work_add() to
> 		   make sure the destructor (__fput()) runs before B returns to
> 		   userland.  sys_close() returns and B goes off to userland.
> 		   On the way out __fput() is run, and among other things,
> 		   ->release() of F is executed, doing whatever it wants to do.
> 		   F is freed.
> And at that point A, which presumably is using the guts of F, gets screwed.

Thanks for these details. That's really helpful.

> 	So please, mark all call sites with "make very sure you never get
> here with unpaired fdget()".

Good point. Will add this comment.

> 	BTW, if my reading (re ->test_run()) is correct, what limits the recursion
> via bpf_sys_bpf()?

Glad you asked! This kind of code review questions are much appreciated.

It's an allowlist of possible commands in bpf_sys_bpf().
'case BPF_PROG_TEST_RUN:' is not there for this exact reason.
I'll add a comment to make it more obvious.