Re: [PATCH] vfs: move fdput() to right place in ksys_sync_file_range()

Brian Vazquez <brianvv@xxxxxxxxxx> · Thu, 12 May 2022 05:48:25 -0700

Sure, let me take a look.

On Wed, May 11, 2022 at 7:04 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, May 11, 2022 at 2:43 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > [bpf folks Cc'd]
> >
> > On Wed, May 11, 2022 at 07:01:34PM +0000, Eric Biggers wrote:
> > > On Wed, May 11, 2022 at 04:51:34PM +0100, Matthew Wilcox wrote:
> > > > On Wed, May 11, 2022 at 11:45:03AM -0400, Chengguang Xu wrote:
> > > > > Move fdput() to right place in ksys_sync_file_range() to
> > > > > avoid fdput() after failed fdget().
> > > >
> > > > Why?  fdput() is already conditional on FDPUT_FPUT so you're ...
> > > > optimising the failure case?
> > >
> > > "fdput() after failed fdget()" has confused people before, so IMO it's worth
> > > cleaning this up.  But the commit message should make clear that it's a cleanup,
> > > not a bug fix.  Also I recommend using an early return:
> > >
> > >       f = fdget(fd);
> > >       if (!f.file)
> > >               return -EBADF;
> > >       ret = sync_file_range(f.file, offset, nbytes, flags);
> > >       fdput(f);
> > >       return ret;
> >
> > FWIW, fdput() after failed fdget() is rare, but there's no fundamental reasons
> > why it would be wrong.  No objections against that patch, anyway.
> >
> > Out of curiousity, I've just looked at the existing users.  In mainline we have
> > 203 callers of fdput()/fdput_pos(); all but 7 never get reached with NULL ->file.
> >
> > 1) There's ksys_sync_file_range(), kernel_read_file_from_fd() and ksys_readahead() -
> > all with similar pattern.  I'm not sure that for readahead(2) "not opened for
> > read" should yield the same error as "bad descriptor", but since it's been a part
> > of userland ABI for a while...
> >
> > 2) two callers in perf_event_open(2) are playing silly buggers with explicit
> >         struct fd group = {NULL, 0};
> > and rely upon "fdput() is a no-op if we hadn't touched that" (note that if
> > we try to touch it and get NULL ->file from fdget(), we do not hit those fdput()
> > at all).
> >
> > 3) ovl_aio_put() is hard to follow (and some of the callers are poking
> > where they shouldn't), no idea if it's correct.  struct fd is manually
> > constructed there, anyway.
> >
> > 4) bpf generic_map_update_batch() is really asking for trouble.  The comment in
> > there is wrong:
> >         f = fdget(ufd); /* bpf_map_do_batch() guarantees ufd is valid */
> > *NOTHING* we'd done earlier can guarantee that.  We might have a descriptor
> > table shared with another thread, and it might have very well done dup2() since
> > the last time we'd looked things up.  IOW, this fdget() is racy - the function
> > assumes it refers to the same thing that gave us map back in bpf_map_do_batch(),
> > but it's not guaranteed at all.
> >
> > I hadn't put together a reproducer, but that code is very suspicious.  As a general
> > rule, you should treat descriptor table as shared object, modifiable by other
> > threads.  It can be explicitly locked and it can be explicitly unshared, but
> > short of that doing a lookup for the same descriptor twice in a row can yield
> > different results.
> >
> > What's going on there?  Do you really want the same struct file you've got back in
> > bpf_map_do_batch() (i.e. the one you've got the map from)?  What should happen
> > if the descriptor changes its meaning during (or after) the operation?
>
> Interesting.
> If I got this right... in the following:
>
> f = fdget(ufd);
> map = __bpf_map_get(f);
> if (IS_ERR(map))
>    return PTR_ERR(map);
> ...
> f = fdget(ufd);
> here there are no guarantees that 'f' is valid and points
> to the same map.
> Argh. In hindsight that makes sense.
>
> generic_map_update_batch calls bpf_map_update_value.
> That could use 'f' for prog_array, fd_array and hash_of_maps
> types of maps.
> The first two types don't' define .map_update_batch callback.
> So BPF_DO_BATCH(map->ops->map_update_batch); will error out
> with -ENOTSUPP since that callback is NULL for that map type.
>
> The hash_of_maps does seem to support it, but
> that's an odd one to use with batch access.
>
> Anyhow we certainly need to clean this up.
>
> Brian,
> do you mind fixing it up, since you've added that
> secondary fdget() in the first place?
>
> Thanks!