On 4/18/22 4:01 PM, Linus Torvalds wrote: > On Mon, Apr 18, 2022 at 2:16 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> So as far as I can tell, we really have two options: >> >> 1) Don't preempt a task that has a plug active >> 2) Flush for any schedule out, not just going to sleep >> >> 1 may not be feasible if we're queueing lots of IO, which then leaves 2. >> Linus, do you remember what your original patch here was motivated by? >> I'm assuming it was an effiency thing, but do we really have a lot of >> cases of IO submissions being preempted a lot and hence making the plug >> less efficient than it should be at merging IO? Seems unlikely, but I >> could be wrong. > > No, it goes all the way back to 2011, my memory for those kinds of > details doesn't go that far back. > > That said, it clearly is about preemption, and I wonder if we had an > actual bug there. > > IOW, it might well not just in the "gather up more IO for bigger > requests" thing, but about "the IO plug is per-thread and doesn't have > locking because of that". > > So doing plug flushing from a preemptible kernel context might race > with it all being set up. Hmm yes. But doesn't preemption imply a full barrier? As long as we assign the plug at the end, we should be fine. And just now looking that up, there's even already a comment to that effect in blk_start_plug(). So barring any weirdness with that, maybe that's the solution. Your comment did jog my memory a bit though, and I do in fact think it was something related to that that made is change it. I'll dig through some old emails and see if I can find it. > Explicit io_schedule() etc obviously doesn't have that issue. Right -- Jens Axboe