Dave Chinner wrote: > So why have we only scanned *176* pages* during reclaim? On other > OOM reports in this trace it's as low as 12. Either that stat is > completely wrong, or we're not doing sufficient page LRU reclaim > scanning.... > > > [ 9662.234685] MemAlloc-Info: 3 stalling task, 0 dying task, 0 victim task. > > > > vmstat_update() and submit_flushes() remained pending for about 110 seconds. > > If xlog_cil_push_work() were spinning inside GFP_NOFS allocation, it should be > > reported as MemAlloc: traces, but no such lines are recorded. I don't know why > > xlog_cil_push_work() did not call schedule() for so long. > > I'd say it is repeatedly waiting for IO completion on log buffers to > write out the checkpoint. It's making progress, just if it's taking > multiple second per journal IO it will take a long time to write a > checkpoint. All the other blocked tasks in XFS inode reclaim are > either waiting directly on IO completion or waiting for the log to > complete a flush, so this really just looks like an overloaded IO > subsystem to me.... The vmstat statistics can become wrong when vmstat_update() workqueue item cannot be processed due to in-flight workqueue item not calling schedule(). If in-flight workqueue item (in this case xlog_cil_push_work()) called schedule(), the pending vmstat_update() workqueue item will be processed and the vmstat becomes up to dated. Like you expect that xlog_cil_push_work() was waiting for IO completion on log buffers rather than spinning inside GFP_NOFS allocation, what should happened is xlog_cil_push_work() called schedule() and vmstat_update() was processed. But vmstat_update() remained pending for about 110 seconds. That's strange... Arkadiusz is trying http://marc.info/?l=linux-mm&m=144725782107096&w=2 which is for making sure that vmstat_update() workqueue item is processed by changing wait_iff_congested() to call schedule(), and we are waiting for test results. Well, one of dependent patches "vmstat: explicitly schedule per-cpu work on the CPU we need it to run on" might be relevant to this problem. If http://sprunge.us/GYBb and http://sprunge.us/XWUX solve the problem (for both with swap case and without swap case), the vmstat statistics was wrong. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>