[ Crossed emails ] On Wed, May 28, 2014 at 6:58 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Thu, May 29, 2014 at 11:30:07AM +1000, Dave Chinner wrote: >> >> And now we have too deep a stack due to unplugging from io_schedule()... > > So, if we make io_schedule() push the plug list off to the kblockd > like is done for schedule().... We might have a few different cases. The cases where we *do* care about latency is when we are waiting for the IO ourselves (ie in wait_on_page() and friends), and those end up using io_schedule() too. So in *that* case we definitely have a latency argument for doing it directly, and we shouldn't kick it off to kblockd. That's very much a "get this started as soon as humanly possible". But the "wait_iff_congested()" code that also uses io_schedule() should push it out to kblockd, methinks. >> This stack overflow shows us that just the memory reclaim + IO >> layers are sufficient to cause a stack overflow, > > .... we solve this problem directly by being able to remove the IO > stack usage from the direct reclaim swap path. > > IOWs, we don't need to turn swap off at all in direct reclaim > because all the swap IO can be captured in a plug list and > dispatched via kblockd. This could be done either by io_schedule() > or a new blk_flush_plug_list() wrapper that pushes the work to > kblockd... That would work. That said, I personally would not mind to see that "swap is special" go away, if possible. Because it can be behind a filesystem too. Christ, even NFS (and people used to fight that tooth and nail!) is back as a swap target.. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>