On Sat, May 31, 2014 at 6:06 AM, Jens Axboe <axboe@xxxxxxxxx> wrote: > On 2014-05-28 20:42, Linus Torvalds wrote: >>> >>> Regardless of whether it is swap or something external queues the >>> bio on the plug, perhaps we should look at why it's done inline >>> rather than by kblockd, where it was moved because it was blowing >>> the stack from schedule(): >> >> >> So it sounds like we need to do this for io_schedule() too. >> >> In fact, we've generally found it to be a mistake every time we >> "automatically" unblock some IO queue. And I'm not saying that because >> of stack space, but because we've _often_ had the situation that eager >> unblocking results in IO that could have been done as bigger requests. > > > We definitely need to auto-unplug on the schedule path, otherwise we run > into all sorts of trouble. But making it async off the IO schedule path is > fine. By definition, it's not latency sensitive if we are hitting unplug on > schedule. I'm pretty sure it was run inline on CPU concerns here, as running > inline is certainly cheaper than punting to kblockd. > > >> Looking at that callchain, I have to say that ext4 doesn't look >> horrible compared to the whole block layer and virtio.. Yes, >> "ext4_writepages()" is using almost 400 bytes of stack, and most of >> that seems to be due to: >> >> struct mpage_da_data mpd; >> struct blk_plug plug; > > > Plus blk_plug is pretty tiny as it is. I queued up a patch to kill the magic > part of it, since that's never caught any bugs. Only saves 8 bytes, but may > as well take that. Especially if we end up with nested plugs. In case of nested plugs only the first one is used? Right? So, it may be embedded into task_struct together with integer recursion counter. This will save bit of precious stack and make it looks cleaner. > > >> Well, we've definitely have had some issues with deeper callchains >> with md, but I suspect virtio might be worse, and the new blk-mq code >> is lilkely worse in this respect too. > > > I don't think blk-mq is worse than the older stack, in fact it should be > better. The call chains are shorter, and a lot less cruft on the stack. > Historically the stack issues have been nested devices, however. And for > sync IO, we do run it inline, so if the driver chews up a lot of stack, > well... > > Looks like I'm late here and the decision has been made to go 16K stacks, > which I think is a good one. We've been living on the edge (and sometimes > over) for heavy dm/md setups for a while, and have been patching around that > fact in the IO stack for years. > > > -- > Jens Axboe > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>