On 10/18/19 8:52 AM, Jann Horn wrote: > On Fri, Oct 18, 2019 at 4:43 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> On 10/18/19 8:40 AM, Jann Horn wrote: >>> On Fri, Oct 18, 2019 at 4:37 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>> >>>> On 10/18/19 8:34 AM, Jann Horn wrote: >>>>> On Fri, Oct 18, 2019 at 4:01 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>>> On 10/17/19 8:41 PM, Jann Horn wrote: >>>>>>> On Fri, Oct 18, 2019 at 4:01 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>>>>> This is in preparation for adding opcodes that need to modify files >>>>>>>> in a process file table, either adding new ones or closing old ones. >>>>> [...] >>>>>> Updated patch1: >>>>>> >>>>>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=df6caac708dae8ee9a74c9016e479b02ad78d436 >>>>> >>>>> I don't understand what you're doing with old_files in there. In the >>>>> "s->files && !old_files" branch, "current->files = s->files" happens >>>>> without holding task_lock(), but current->files and s->files are also >>>>> the same already at that point anyway. And what's the intent behind >>>>> assigning stuff to old_files inside the loop? Isn't that going to >>>>> cause the workqueue to keep a modified current->files beyond the >>>>> runtime of the work? >>>> >>>> I simply forgot to remove the old block, it should only have this one: >>>> >>>> if (s->files && s->files != cur_files) { >>>> task_lock(current); >>>> current->files = s->files; >>>> task_unlock(current); >>>> if (cur_files) >>>> put_files_struct(cur_files); >>>> cur_files = s->files; >>>> } >>> >>> Don't you still need a put_files_struct() in the case where "s->files >>> == cur_files"? >> >> I want to hold on to the files for as long as I can, to avoid unnecessary >> shuffling of it. But I take it your worry here is that we'll be calling >> something that manipulates ->files? Nothing should do that, unless >> s->files is set. We didn't hide the workqueue ->files[] before this >> change either. > > No, my worry is that the refcount of the files_struct is left too > high. From what I can tell, the "do" loop in io_sq_wq_submit_work() > iterates over multiple instances of struct sqe_submit. If there are > two sqe_submit instances with the same ->files (each holding a > reference from the get_files_struct() in __io_queue_sqe()), then: > > When processing the first sqe_submit instance, current->files and > cur_files are set to $user_files. > When processing the second sqe_submit instance, nothing happens > (s->files == cur_files). > After the loop, at the end of the function, put_files_struct() is > called once on $user_files. > > So get_files_struct() has been called twice, but put_files_struct() > has only been called once. That leaves the refcount too high, and by > repeating this, an attacker can make the refcount wrap around and then > cause a use-after-free. Ah now I see what you are getting at, yes that's clearly a bug! I wonder how we best safely can batch the drops. We can track the number of times we've used the same files, and do atomic_sub_and_test() in a put_files_struct_many() type addition. But that would leave us open to the issue you describe, where someone could maliciously overflow the files ref count. Probably not worth over-optimizing, as long as we can avoid the current->files task lock/unlock and shuffle. I'll update the patch. -- Jens Axboe