Sure, let me take a look. On Wed, May 11, 2022 at 7:04 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, May 11, 2022 at 2:43 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > [bpf folks Cc'd] > > > > On Wed, May 11, 2022 at 07:01:34PM +0000, Eric Biggers wrote: > > > On Wed, May 11, 2022 at 04:51:34PM +0100, Matthew Wilcox wrote: > > > > On Wed, May 11, 2022 at 11:45:03AM -0400, Chengguang Xu wrote: > > > > > Move fdput() to right place in ksys_sync_file_range() to > > > > > avoid fdput() after failed fdget(). > > > > > > > > Why? fdput() is already conditional on FDPUT_FPUT so you're ... > > > > optimising the failure case? > > > > > > "fdput() after failed fdget()" has confused people before, so IMO it's worth > > > cleaning this up. But the commit message should make clear that it's a cleanup, > > > not a bug fix. Also I recommend using an early return: > > > > > > f = fdget(fd); > > > if (!f.file) > > > return -EBADF; > > > ret = sync_file_range(f.file, offset, nbytes, flags); > > > fdput(f); > > > return ret; > > > > FWIW, fdput() after failed fdget() is rare, but there's no fundamental reasons > > why it would be wrong. No objections against that patch, anyway. > > > > Out of curiousity, I've just looked at the existing users. In mainline we have > > 203 callers of fdput()/fdput_pos(); all but 7 never get reached with NULL ->file. > > > > 1) There's ksys_sync_file_range(), kernel_read_file_from_fd() and ksys_readahead() - > > all with similar pattern. I'm not sure that for readahead(2) "not opened for > > read" should yield the same error as "bad descriptor", but since it's been a part > > of userland ABI for a while... > > > > 2) two callers in perf_event_open(2) are playing silly buggers with explicit > > struct fd group = {NULL, 0}; > > and rely upon "fdput() is a no-op if we hadn't touched that" (note that if > > we try to touch it and get NULL ->file from fdget(), we do not hit those fdput() > > at all). > > > > 3) ovl_aio_put() is hard to follow (and some of the callers are poking > > where they shouldn't), no idea if it's correct. struct fd is manually > > constructed there, anyway. > > > > 4) bpf generic_map_update_batch() is really asking for trouble. The comment in > > there is wrong: > > f = fdget(ufd); /* bpf_map_do_batch() guarantees ufd is valid */ > > *NOTHING* we'd done earlier can guarantee that. We might have a descriptor > > table shared with another thread, and it might have very well done dup2() since > > the last time we'd looked things up. IOW, this fdget() is racy - the function > > assumes it refers to the same thing that gave us map back in bpf_map_do_batch(), > > but it's not guaranteed at all. > > > > I hadn't put together a reproducer, but that code is very suspicious. As a general > > rule, you should treat descriptor table as shared object, modifiable by other > > threads. It can be explicitly locked and it can be explicitly unshared, but > > short of that doing a lookup for the same descriptor twice in a row can yield > > different results. > > > > What's going on there? Do you really want the same struct file you've got back in > > bpf_map_do_batch() (i.e. the one you've got the map from)? What should happen > > if the descriptor changes its meaning during (or after) the operation? > > Interesting. > If I got this right... in the following: > > f = fdget(ufd); > map = __bpf_map_get(f); > if (IS_ERR(map)) > return PTR_ERR(map); > ... > f = fdget(ufd); > here there are no guarantees that 'f' is valid and points > to the same map. > Argh. In hindsight that makes sense. > > generic_map_update_batch calls bpf_map_update_value. > That could use 'f' for prog_array, fd_array and hash_of_maps > types of maps. > The first two types don't' define .map_update_batch callback. > So BPF_DO_BATCH(map->ops->map_update_batch); will error out > with -ENOTSUPP since that callback is NULL for that map type. > > The hash_of_maps does seem to support it, but > that's an odd one to use with batch access. > > Anyhow we certainly need to clean this up. > > Brian, > do you mind fixing it up, since you've added that > secondary fdget() in the first place? > > Thanks!