On Mon, Apr 18, 2022 at 2:16 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > > So as far as I can tell, we really have two options: > > 1) Don't preempt a task that has a plug active > 2) Flush for any schedule out, not just going to sleep > > 1 may not be feasible if we're queueing lots of IO, which then leaves 2. > Linus, do you remember what your original patch here was motivated by? > I'm assuming it was an effiency thing, but do we really have a lot of > cases of IO submissions being preempted a lot and hence making the plug > less efficient than it should be at merging IO? Seems unlikely, but I > could be wrong. No, it goes all the way back to 2011, my memory for those kinds of details doesn't go that far back. That said, it clearly is about preemption, and I wonder if we had an actual bug there. IOW, it might well not just in the "gather up more IO for bigger requests" thing, but about "the IO plug is per-thread and doesn't have locking because of that". So doing plug flushing from a preemptible kernel context might race with it all being set up. Explicit io_schedule() etc obviously doesn't have that issue. Linus