On 10/18/19 10:36 AM, Jens Axboe wrote: >> Ignoring the locking elision, basically the logic is now this: >> >> static void io_sq_wq_submit_work(struct work_struct *work) >> { >> struct io_kiocb *req = container_of(work, struct io_kiocb, work); >> struct files_struct *cur_files = NULL, *old_files; >> [...] >> old_files = current->files; >> [...] >> do { >> struct sqe_submit *s = &req->submit; >> [...] >> if (cur_files) >> /* drop cur_files reference; borrow lifetime must >> * end before here */ >> put_files_struct(cur_files); >> /* move reference ownership to cur_files */ >> cur_files = s->files; >> if (cur_files) { >> task_lock(current); >> /* current->files borrows reference from cur_files; >> * existing borrow from previous loop ends here */ >> current->files = cur_files; >> task_unlock(current); >> } >> >> [call __io_submit_sqe()] >> [...] >> } while (req); >> [...] >> /* existing borrow ends here */ >> task_lock(current); >> current->files = old_files; >> task_unlock(current); >> if (cur_files) >> /* drop cur_files reference; borrow lifetime must >> * end before here */ >> put_files_struct(cur_files); >> } >> >> If you run two iterations of this loop, with a first element that has >> a ->files pointer and a second element that doesn't, then in the >> second run through the loop, the reference to the files_struct will be >> dropped while current->files still points to it; current->files is >> only reset after the loop has ended. If someone accesses >> current->files through procfs directly after that, AFAICS you'd get a >> use-after-free. > > Amazing how this is still broken. You are right, and it's especially > annoying since that's exactly the case I originally talked about (not > flipping current->files if we don't have to). I just did it wrong, so > we'll leave a dangling pointer in ->files. > > The by far most common case is if one sqe has a files it needs to > attach, then others that also have files will be the same set. So I want > to optimize for the case where we only flip current->files once when we > see the files, and once when we're done with the loop. > > Let me see if I can get this right... I _think_ the simplest way to do it is simply to have both cur_files and current->files hold a reference to the file table. That won't really add any extra cost as the double increments / decrements are following each other. Something like this incremental, totally untested. diff --git a/fs/io_uring.c b/fs/io_uring.c index 2fed0badad38..b3cf3f3d7911 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2293,9 +2293,14 @@ static void io_sq_wq_submit_work(struct work_struct *work) put_files_struct(cur_files); cur_files = s->files; if (cur_files && cur_files != current->files) { + struct files_struct *old; + + atomic_inc(&cur_files->count); task_lock(current); + old = current->files; current->files = cur_files; task_unlock(current); + put_files_struct(old); } if (!ret) { @@ -2390,9 +2395,13 @@ static void io_sq_wq_submit_work(struct work_struct *work) mmput(cur_mm); } if (old_files != current->files) { + struct files_struct *old; + task_lock(current); + old = current->files; current->files = old_files; task_unlock(current); + put_files_struct(old); } if (cur_files) put_files_struct(cur_files); -- Jens Axboe