On Mon, Oct 15, 2012 at 7:44 AM, Minchan Kim <minchan@xxxxxxxxxx> wrote: > Hello, > > On Fri, Sep 28, 2012 at 10:32:20AM -0700, Luigi Semenzato wrote: >> Greetings, >> >> We are experimenting with zram in Chrome OS. It works quite well >> until the system runs out of memory, at which point it seems to hang, >> but we suspect it is thrashing. >> >> Before the (apparent) hang, the OOM killer gets rid of a few >> processes, but then the other processes gradually stop responding, >> until the entire system becomes unresponsive. > > Why do you think it's zram problem? If you use swap device as storage > instead of zram, does the problem disappear? I haven't tried with a swap device, but that is a good suggestion. I didn't want to swap to disk (too slow compared to zram, so it's not the same experiment any more), but I could preallocate a RAM disk and swap to that. > Could you do sysrq+t,m several time and post it while hang happens? > /proc/vmstat could be helpful, too. The stack traces look mostly like this: [ 2058.069020] [<810681c4>] handle_edge_irq+0x8f/0xb1 [ 2058.069028] <IRQ> [<810037ed>] ? do_IRQ+0x3f/0x98 [ 2058.069044] [<813b7eb0>] ? common_interrupt+0x30/0x38 [ 2058.069058] [<8108007b>] ? ftrace_raw_event_rpm_internal+0xf/0x108 [ 2058.069072] [<81196c1a>] ? do_raw_spin_lock+0x93/0xf3 [ 2058.069085] [<813b70d5>] ? _raw_spin_lock+0xd/0xf [ 2058.069097] [<810b418c>] ? put_super+0x15/0x29 [ 2058.069108] [<810b41ba>] ? drop_super+0x1a/0x1d [ 2058.069119] [<810b4d04>] ? prune_super+0x106/0x110 [ 2058.069132] [<81093647>] ? shrink_slab+0x7f/0x22f [ 2058.069144] [<81095943>] ? try_to_free_pages+0x1b7/0x2e6 [ 2058.069158] [<8108de27>] ? __alloc_pages_nodemask+0x412/0x5d5 [ 2058.069173] [<810a9c6a>] ? read_swap_cache_async+0x4a/0xcf [ 2058.069185] [<810a9d50>] ? swapin_readahead+0x61/0x8d [ 2058.069198] [<8109fea0>] ? handle_pte_fault+0x310/0x5fb [ 2058.069208] [<8100223a>] ? do_signal+0x470/0x4fe [ 2058.069220] [<810a02cc>] ? handle_mm_fault+0xae/0xbd [ 2058.069233] [<8101d0f9>] ? do_page_fault+0x265/0x284 [ 2058.069247] [<81192b32>] ? copy_to_user+0x3e/0x49 [ 2058.069257] [<8100306d>] ? do_spurious_interrupt_bug+0x26/0x26 [ 2058.069270] [<81009279>] ? init_fpu+0x73/0x81 [ 2058.069280] [<8100275e>] ? math_state_restore+0x1f/0xa0 [ 2058.069290] [<8100306d>] ? do_spurious_interrupt_bug+0x26/0x26 [ 2058.069303] [<8101ce94>] ? vmalloc_sync_all+0xa/0xa [ 2058.069315] [<813b7737>] ? error_code+0x67/0x6c The bottom part of the stack varies, but most processes are spending a lot of time in prune_super(). There is a pretty high number of mounted file systems, and do_try_to_free_pages() keeps calling shrink_slab() even when there is nothing to reclaim there. In addition, do_try_to_free_pages() keeps returning 1 because all_unreclaimable() at the end is always false. The allocator thinks that zone 1 has freeable pages (zones 0 and 2 do not). That prevents the allocator from ooming. I went in some more depth, but didn't quite untangle all that goes on. In any case, this explains why I came up with the theory that somehow mm is too optimistic about how many pages are freeable. Then I found what looks like a smoking gun in vmscan.c: if (nr_swap_pages > 0) nr += zone_page_state(zone, NR_ACTIVE_ANON) + zone_page_state(zone, NR_INACTIVE_ANON); which seems to ignore that not all ANON pages are freeable if swap space is limited. Pretty much all processes hang while trying to allocate memory. Those that don't allocate memory keep running fine. vmstat 1 shows a large amount of swapping activity, which drops to 0 when the processes hang. /proc/meminfo and /proc/vmstat are at the bottom. > >> >> I am wondering if anybody has run into this. Thanks! >> >> Luigi >> >> P.S. For those who wish to know more: >> >> 1. We use the min_filelist_kbytes patch >> (http://lwn.net/Articles/412313/) (I am not sure if it made it into >> the standard kernel) and set min_filelist_kbytes to 50Mb. (This may >> not matter, as it's unlikely to make things worse.) > > One of the problem I look at this patch is it might prevent > increasing of zone->pages_scanned when the swap if full or anon pages > are very small although there are lots of file-backed pages. > It means OOM can't occur and page allocator could loop forever. > Please look at zone_reclaimable. Yes---I think you are right. It didn't matter to us because we don't use swap. The problem looks fixable. > Have you ever test it without above patch? Good suggestion. I just did. Almost all text pages are evicted, and then the system thrashes so badly that the hang detector kicks in after a couple of minutes and panics. Thank you for the very helpful suggestions! > >> >> 2. We swap only to compressed ram. The setup is very simple: >> >> echo ${ZRAM_SIZE_KB}000 >/sys/block/zram0/disksize || >> logger -t "$UPSTART_JOB" "failed to set zram size" >> mkswap /dev/zram0 || logger -t "$UPSTART_JOB" "mkswap /dev/zram0 failed" >> swapon /dev/zram0 || logger -t "$UPSTART_JOB" "swapon /dev/zram0 failed" >> >> For ZRAM_SIZE_KB, we typically use 1.5 the size of RAM (which is 2 or >> 4 Gb). The compression factor is about 3:1. The hangs happen for >> quite a wide range of zram sizes. >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > > -- > Kind Regards, > Minchan Kim MemTotal: 2002292 kB MemFree: 15148 kB Buffers: 260 kB Cached: 169952 kB SwapCached: 149448 kB Active: 722608 kB Inactive: 290824 kB Active(anon): 682680 kB Inactive(anon): 230888 kB Active(file): 39928 kB Inactive(file): 59936 kB Unevictable: 0 kB Mlocked: 0 kB HighTotal: 74504 kB HighFree: 0 kB LowTotal: 1927788 kB LowFree: 15148 kB SwapTotal: 2933044 kB SwapFree: 47968 kB Dirty: 0 kB Writeback: 56 kB AnonPages: 695180 kB Mapped: 73276 kB Shmem: 70276 kB Slab: 19596 kB SReclaimable: 9152 kB SUnreclaim: 10444 kB KernelStack: 1448 kB PageTables: 9964 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 3934188 kB Committed_AS: 4371740 kB VmallocTotal: 122880 kB VmallocUsed: 22268 kB VmallocChunk: 100340 kB DirectMap4k: 34808 kB DirectMap2M: 1927168 kB nr_free_pages 3776 nr_inactive_anon 58243 nr_active_anon 172106 nr_inactive_file 14984 nr_active_file 9982 nr_unevictable 0 nr_mlock 0 nr_anon_pages 174840 nr_mapped 18387 nr_file_pages 80762 nr_dirty 0 nr_writeback 13 nr_slab_reclaimable 2290 nr_slab_unreclaimable 2611 nr_page_table_pages 2471 nr_kernel_stack 180 nr_unstable 0 nr_bounce 0 nr_vmscan_write 679247 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 416 nr_isolated_file 0 nr_shmem 17637 nr_dirtied 7630 nr_written 686863 nr_anon_transparent_hugepages 0 nr_dirty_threshold 151452 nr_dirty_background_threshold 2524 pgpgin 284189 pgpgout 2748940 pswpin 5602 pswpout 679271 pgalloc_dma 9976 pgalloc_normal 1426651 pgalloc_high 34659 pgalloc_movable 0 pgfree 1475099 pgactivate 58092 pgdeactivate 745734 pgfault 1489876 pgmajfault 1098 pgrefill_dma 8557 pgrefill_normal 742123 pgrefill_high 4088 pgrefill_movable 0 pgsteal_kswapd_dma 199 pgsteal_kswapd_normal 48387 pgsteal_kswapd_high 2443 pgsteal_kswapd_movable 0 pgsteal_direct_dma 7688 pgsteal_direct_normal 652670 pgsteal_direct_high 6242 pgsteal_direct_movable 0 pgscan_kswapd_dma 268 pgscan_kswapd_normal 105036 pgscan_kswapd_high 8395 pgscan_kswapd_movable 0 pgscan_direct_dma 185240 pgscan_direct_normal 23961886 pgscan_direct_high 584047 pgscan_direct_movable 0 pginodesteal 123 slabs_scanned 10368 kswapd_inodesteal 1 kswapd_low_wmark_hit_quickly 15 kswapd_high_wmark_hit_quickly 8 kswapd_skip_congestion_wait 639 pageoutrun 582 allocstall 14514 pgrotated 1 unevictable_pgs_culled 0 unevictable_pgs_scanned 0 unevictable_pgs_rescued 1 unevictable_pgs_mlocked 1 unevictable_pgs_munlocked 1 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 unevictable_pgs_mlockfreed 0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>