Re: OOM killer changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 15-08-16 11:16:36, Vlastimil Babka wrote:
> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
> > On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote:
> >>
> > Took me a little longer than expected due to work. The failure wouldn't 
> > happen for a while and so I started a couple of scripts and let them 
> > run. When I checked today the server didn't respond on the network and 
> > sure enough it had killed everything. This is with 4.7.0 with the config 
> > based on Debian 4.7-rc7.
> > 
> > trace_pipe got a little big (5GB) so I uploaded the logs to 
> > https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is 
> > before the btrfs filesystems were mounted.
> > I did run a btrfs balance because it creates IO load and I needed to 
> > balance anyway. Maybe that's what caused it?
> 
> pgmigrate_success        46738962
> pgmigrate_fail          135649772
> compact_migrate_scanned 309726659
> compact_free_scanned   9715615169
> compact_isolated        229689596
> compact_stall 4777
> compact_fail 3068
> compact_success 1709
> compact_daemon_wake 207834
> 
> The migration failures are quite enormous. Very quick analysis of the
> trace seems to confirm that these are mostly "real", as opposed to result
> of failure to isolate free pages for migration targets, although the free
> scanner spent a lot of time:
> 
> > grep "nr_failed=32" -B1 trace_pipe.log | grep isolate_freepages.*nr_taken=0 | wc -l
> 3246
> 
> So is it one of the cases where fs is unable to migrate dirty/writeback pages?

It smells that way. Now we should find out why and what can we do about
that. I suspect that try_to_release_page is not able to release the page
for migration. Btrfs doesn't seem to have migratepage for page cache
pages so it should go via fallback_migrate_page.

The following diff should tell us whether this is really the case. Just
open trace_pipe and see whether this path really triggered.
---
diff --git a/mm/migrate.c b/mm/migrate.c
index 72c09dea6526..120e2e5fcbea 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -729,8 +729,10 @@ static int fallback_migrate_page(struct address_space *mapping,
 	 * We must have no buffers or drop them.
 	 */
 	if (page_has_private(page) &&
-	    !try_to_release_page(page, GFP_KERNEL))
+	    !try_to_release_page(page, GFP_KERNEL)) {
+		trace_printk("try_to_release_page failed for a_ops:%pS\n", page->a_ops);
 		return -EAGAIN;
+	}
 
 	return migrate_page(mapping, newpage, page, mode);
 }
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]