On 4/18/22 4:12 PM, Jens Axboe wrote: > On 4/18/22 4:01 PM, Linus Torvalds wrote: >> On Mon, Apr 18, 2022 at 2:16 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>> >>> So as far as I can tell, we really have two options: >>> >>> 1) Don't preempt a task that has a plug active >>> 2) Flush for any schedule out, not just going to sleep >>> >>> 1 may not be feasible if we're queueing lots of IO, which then leaves 2. >>> Linus, do you remember what your original patch here was motivated by? >>> I'm assuming it was an effiency thing, but do we really have a lot of >>> cases of IO submissions being preempted a lot and hence making the plug >>> less efficient than it should be at merging IO? Seems unlikely, but I >>> could be wrong. >> >> No, it goes all the way back to 2011, my memory for those kinds of >> details doesn't go that far back. >> >> That said, it clearly is about preemption, and I wonder if we had an >> actual bug there. >> >> IOW, it might well not just in the "gather up more IO for bigger >> requests" thing, but about "the IO plug is per-thread and doesn't have >> locking because of that". >> >> So doing plug flushing from a preemptible kernel context might race >> with it all being set up. > > Hmm yes. But doesn't preemption imply a full barrier? As long as we > assign the plug at the end, we should be fine. And just now looking that > up, there's even already a comment to that effect in blk_start_plug(). > So barring any weirdness with that, maybe that's the solution. > > Your comment did jog my memory a bit though, and I do in fact think it > was something related to that that made is change it. I'll dig through > some old emails and see if I can find it. Here's the thread: https://lore.kernel.org/all/1295659049-2688-6-git-send-email-jaxboe@xxxxxxxxxxxx/ I'll dig through it in a bit, but here's your reasoning for why it should not flush on preemption: https://lore.kernel.org/all/BANLkTikBEJa7bJJoLFU7NoiEgOjVHVG08A@xxxxxxxxxxxxxx/ -- Jens Axboe