On Tue, May 28, 2024 at 9:36 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Mon, May 27, 2024 at 04:03:56PM +0000, Alice Ryhl wrote: > > > > In other words, if current->files->count is equal to 1 at fdget() time > > > we can skip incrementing refcount. Matching fdput() would need to > > > skip decrement, of course. Note that we must record that (borrowed > > > vs. cloned) in struct fd - the condition cannot be rechecked at fdput() > > > time, since the table that had been shared at fdget() time might no longer > > > be shared by the time of fdput(). > > > > This is great! It matches my understanding. I didn't know the details > > about current->files and task->files. > > > > You should copy this to the kernel documentation somewhere. :) > > Probably, after it's turned into something more coherent - and after > the description of struct fd scope rules is corrected ;-/ > > Correction in question: you _are_ allowed to move reference from > descriptor table while in scope of struct fd; what you are not allowed > is dropping that reference until the end of scope. The patch you are commenting on contains a change to fs/file.c with exactly that correction. I'm not sure if you noticed it - I should probably have put it in its own commit to make it more obvious. > That's what binder_do_fd_close() is about - binder_deferred_fd_close() > is called in a struct fd scope (anything called from ->unlocked_ioctl() > instances is). It *does* remove a struct file reference from > descriptor table: > twcb->file = file_close_fd(fd); > moves that reference to twcb->file, and subsequent > get_file(twcb->file); > filp_close(twcb->file, current->files); > completes detaching it from the table[*] and the reference itself > is dropped via task_work, which is going to be executed on the > way out to userland, definitely after we leave the scope of > struct fd. Yeah. If you look at previous versions of this patchset, it contains Rust code for performing exactly that dance. I was asked to drop it from the patch series, though. > Incidentally, I'm very tempted to unexport close_fd(); in addition to > being a source of bugs when called from ioctl on user-supplied descriptor > it encourages racy crap - just look at e.g. 1819200166ce > "drm/amdkfd: Export DMABufs from KFD using GEM handles", where we > call drm_gem_prime_handle_to_fd(), immediately followed by > dmabuf = dma_buf_get(fd); > close_fd(fd); > dup2() from another thread with guessed descriptor number as target and > you've got a problem... It's not a violation of fdget() use rules > (it is called from ioctl, but descriptor is guaranteed to be different > from the one passed to ioctl(2)), but it's still wrong. Would take > some work, though... Wait, what's going on there? It adds the fd and then immediately removes it again, or? > "Detaching from the table" bit also needs documenting, BTW. If you look > at that thing, you'll see that current->files is converted to fl_owner_t, > which is an opaque pointer. What happens is that dnotify and POSIX lock > use current->files as opaque tags (->dn_owner and ->flc_owner resp.) and > filp_close() (well, filp_flush() these days) needs to be called to > purge all of those associated with given struct file and given tag value. Ah, yes, fl_owner_t being a void pointer rather than having a proper type caused a bug in an early version of Rust Binder ... Alice > That needs to be done between removal of file reference from table and > destruction of the table itself and it guarantees that those opaque references > won't outlive the table (more importantly, don't survive until a different > files_struct instance is allocated at the same address). > > [*] NB: might make sense to export filp_flush(), since that's what this > sequence boils down to. We really need a better identifier, though - > "filp" part is a leftover from OSDI, AFAICT; that's a hungarism for > "file pointer" and it makes no sense. file_flush() would be better, > IMO - or flush_file(), for that matter.