On Mon 15-08-16 11:16:36, Vlastimil Babka wrote: > On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote: > > On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote: > >> > > Took me a little longer than expected due to work. The failure wouldn't > > happen for a while and so I started a couple of scripts and let them > > run. When I checked today the server didn't respond on the network and > > sure enough it had killed everything. This is with 4.7.0 with the config > > based on Debian 4.7-rc7. > > > > trace_pipe got a little big (5GB) so I uploaded the logs to > > https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is > > before the btrfs filesystems were mounted. > > I did run a btrfs balance because it creates IO load and I needed to > > balance anyway. Maybe that's what caused it? > > pgmigrate_success 46738962 > pgmigrate_fail 135649772 > compact_migrate_scanned 309726659 > compact_free_scanned 9715615169 > compact_isolated 229689596 > compact_stall 4777 > compact_fail 3068 > compact_success 1709 > compact_daemon_wake 207834 > > The migration failures are quite enormous. Very quick analysis of the > trace seems to confirm that these are mostly "real", as opposed to result > of failure to isolate free pages for migration targets, although the free > scanner spent a lot of time: > > > grep "nr_failed=32" -B1 trace_pipe.log | grep isolate_freepages.*nr_taken=0 | wc -l > 3246 > > So is it one of the cases where fs is unable to migrate dirty/writeback pages? It smells that way. Now we should find out why and what can we do about that. I suspect that try_to_release_page is not able to release the page for migration. Btrfs doesn't seem to have migratepage for page cache pages so it should go via fallback_migrate_page. The following diff should tell us whether this is really the case. Just open trace_pipe and see whether this path really triggered. --- diff --git a/mm/migrate.c b/mm/migrate.c index 72c09dea6526..120e2e5fcbea 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -729,8 +729,10 @@ static int fallback_migrate_page(struct address_space *mapping, * We must have no buffers or drop them. */ if (page_has_private(page) && - !try_to_release_page(page, GFP_KERNEL)) + !try_to_release_page(page, GFP_KERNEL)) { + trace_printk("try_to_release_page failed for a_ops:%pS\n", page->a_ops); return -EAGAIN; + } return migrate_page(mapping, newpage, page, mode); } -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>