On Fri, Oct 18, 2019 at 7:05 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > On 10/18/19 10:36 AM, Jens Axboe wrote: > >> Ignoring the locking elision, basically the logic is now this: > >> > >> static void io_sq_wq_submit_work(struct work_struct *work) > >> { > >> struct io_kiocb *req = container_of(work, struct io_kiocb, work); > >> struct files_struct *cur_files = NULL, *old_files; > >> [...] > >> old_files = current->files; > >> [...] > >> do { > >> struct sqe_submit *s = &req->submit; > >> [...] > >> if (cur_files) > >> /* drop cur_files reference; borrow lifetime must > >> * end before here */ > >> put_files_struct(cur_files); > >> /* move reference ownership to cur_files */ > >> cur_files = s->files; > >> if (cur_files) { > >> task_lock(current); > >> /* current->files borrows reference from cur_files; > >> * existing borrow from previous loop ends here */ > >> current->files = cur_files; > >> task_unlock(current); > >> } > >> > >> [call __io_submit_sqe()] > >> [...] > >> } while (req); > >> [...] > >> /* existing borrow ends here */ > >> task_lock(current); > >> current->files = old_files; > >> task_unlock(current); > >> if (cur_files) > >> /* drop cur_files reference; borrow lifetime must > >> * end before here */ > >> put_files_struct(cur_files); > >> } > >> > >> If you run two iterations of this loop, with a first element that has > >> a ->files pointer and a second element that doesn't, then in the > >> second run through the loop, the reference to the files_struct will be > >> dropped while current->files still points to it; current->files is > >> only reset after the loop has ended. If someone accesses > >> current->files through procfs directly after that, AFAICS you'd get a > >> use-after-free. > > > > Amazing how this is still broken. You are right, and it's especially > > annoying since that's exactly the case I originally talked about (not > > flipping current->files if we don't have to). I just did it wrong, so > > we'll leave a dangling pointer in ->files. > > > > The by far most common case is if one sqe has a files it needs to > > attach, then others that also have files will be the same set. So I want > > to optimize for the case where we only flip current->files once when we > > see the files, and once when we're done with the loop. > > > > Let me see if I can get this right... > > I _think_ the simplest way to do it is simply to have both cur_files and > current->files hold a reference to the file table. That won't really add > any extra cost as the double increments / decrements are following each > other. Something like this incremental, totally untested. > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 2fed0badad38..b3cf3f3d7911 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -2293,9 +2293,14 @@ static void io_sq_wq_submit_work(struct work_struct *work) > put_files_struct(cur_files); > cur_files = s->files; > if (cur_files && cur_files != current->files) { > + struct files_struct *old; > + > + atomic_inc(&cur_files->count); > task_lock(current); > + old = current->files; > current->files = cur_files; > task_unlock(current); > + put_files_struct(old); > } > > if (!ret) { > @@ -2390,9 +2395,13 @@ static void io_sq_wq_submit_work(struct work_struct *work) > mmput(cur_mm); > } > if (old_files != current->files) { > + struct files_struct *old; > + > task_lock(current); > + old = current->files; > current->files = old_files; > task_unlock(current); > + put_files_struct(old); > } > if (cur_files) > put_files_struct(cur_files); The only part I still feel a bit twitchy about is this part at the end: if (old_files != current->files) { struct files_struct *old; task_lock(current); old = current->files; current->files = old_files; task_unlock(current); put_files_struct(old); } If it was possible for the initial ->files to be the same as the ->files of a submission, and we got two submissions with first a different files_struct and then our old one, then this branch would not be executed even though it should, which would leave the refcount of the files_struct one too high. But that probably can't happen? Since kernel workers should be running with &init_files (I think?) and that thing is never used for userspace tasks. But still, I'd feel better if you could change it like this: diff --git a/fs/io_uring.c b/fs/io_uring.c index f9f5c70564f0..7673035d6bfe 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2265,6 +2265,7 @@ static void io_sq_wq_submit_work(struct work_struct *work) { struct io_kiocb *req = container_of(work, struct io_kiocb, work); struct files_struct *cur_files = NULL, *old_files; + bool restore_current_files = false; struct io_ring_ctx *ctx = req->ctx; struct mm_struct *cur_mm = NULL; struct async_list *async_list; @@ -2313,6 +2314,7 @@ static void io_sq_wq_submit_work(struct work_struct *work) current->files = cur_files; task_unlock(current); put_files_struct(old); + restore_current_files = true; } if (!ret) { @@ -2406,7 +2408,7 @@ static void io_sq_wq_submit_work(struct work_struct *work) unuse_mm(cur_mm); mmput(cur_mm); } - if (old_files != current->files) { + if (restore_current_files) { struct files_struct *old; task_lock(current); But actually, by the way: Is this whole files_struct thing creating a reference loop? The files_struct has a reference to the uring file, and the uring file has ACCEPT work that has a reference to the files_struct. If the task gets killed and the accept work blocks, the entire files_struct will stay alive, right?