Re: zram on ARM

Luigi Semenzato <semenzato@xxxxxxxxxx> · Fri, 2 Nov 2012 17:09:28 -0700

I forgot to say, I also applied David's patch that prevents the
deadlock I was seeing earlier on x86.

On Fri, Nov 2, 2012 at 5:04 PM, Luigi Semenzato <semenzato@xxxxxxxxxx> wrote:
> I have better results now from using zram on ARM.
>
> For those who followed my previous thread (zram OOM behavior), this is
> a different problem, not connected with OOM.  (At least i don't think
> so.)
>
> I am running a large instance of the Chrome browser on an ARM platform
> with 2 GB of RAM.  I create a zram swap device with 3 GB.  On x86, we
> have measured a compression ratio of about 3:1, so this leaves roughly
> half the RAM for compressed swap use.
>
> I am running kernel 3.4 on these platforms.  To be able to run on ARM,
> I applied a recent patch which removes the x86 dependency from
> zsmalloc.
>
> This identical setup works fine on x86.
>
> On ARM, the system starts swapping to RAM (at about 20MB/second), but
> when it still has between 1 to 2 GB of swap space available (zram
> device about 1/2 full), it stops swapping (si = so = 0 from "vmstat
> 1"), and most processes stop responding.  As in my previous situation,
> some processes keep running, and they appear to be those that don't
> try to allocate memory.
>
> I can rely on SysRQ and preserved memory in this situation, but the
> buffer size is 128k and not large enough for a full dump of all
> stacks.  I am attaching the (truncated) log for this case.
>
> Many processes are waiting for memory on a page fault, for instance these:
>
> [  273.434964] chrome          R running      0  4279   1175 0x00200000
> [  273.441393] [<804e98d4>] (__schedule+0x66c/0x738) from [<804e9d2c>]
> (schedule+0x8c/0x90)
> [  273.449551] [<804e9d2c>] (schedule+0x8c/0x90) from [<804e7ef0>]
> (schedule_timeout+0x278/0x2d4)
> [  273.458232] [<804e7ef0>] (schedule_timeout+0x278/0x2d4) from
> [<804e7f7c>] (schedule_timeout_uninterruptible+0x30/0x34)
> [  273.468995] [<804e7f7c>]
> (schedule_timeout_uninterruptible+0x30/0x34) from [<800bb898>]
> (__alloc_pages_nodemask+0x5d4/0x7a8)
> [  273.480280] [<800bb898>] (__alloc_pages_nodemask+0x5d4/0x7a8) from
> [<800e2fe0>] (read_swap_cache_async+0x54/0x11c)
> [  273.490695] [<800e2fe0>] (read_swap_cache_async+0x54/0x11c) from
> [<800e310c>] (swapin_readahead+0x64/0x9c)
> [  273.500418] [<800e310c>] (swapin_readahead+0x64/0x9c) from
> [<800d5acc>] (handle_pte_fault+0x2d8/0x668)
> [  273.509791] [<800d5acc>] (handle_pte_fault+0x2d8/0x668) from
> [<800d5f20>] (handle_mm_fault+0xc4/0xdc)
> [  273.519079] [<800d5f20>] (handle_mm_fault+0xc4/0xdc) from
> [<8001b080>] (do_page_fault+0x114/0x354)
> [  273.528105] [<8001b080>] (do_page_fault+0x114/0x354) from
> [<800083d8>] (do_DataAbort+0x44/0xa8)
> [  273.536871] [<800083d8>] (do_DataAbort+0x44/0xa8) from [<8000dc78>]
> (__dabt_usr+0x38/0x40)
>
> [  270.435243] Chrome_ChildIOT R running      0  3166   1175 0x00200000
> [  270.441673] [<804e98d4>] (__schedule+0x66c/0x738) from [<8005696c>]
> (__cond_resched+0x30/0x40)
> [  270.450352] [<8005696c>] (__cond_resched+0x30/0x40) from
> [<804e9a44>] (_cond_resched+0x40/0x50)
> [  270.459118] [<804e9a44>] (_cond_resched+0x40/0x50) from
> [<800bb798>] (__alloc_pages_nodemask+0x4d4/0x7a8)
> [  270.468755] [<800bb798>] (__alloc_pages_nodemask+0x4d4/0x7a8) from
> [<800e2fe0>] (read_swap_cache_async+0x54/0x11c)
> [  270.479170] [<800e2fe0>] (read_swap_cache_async+0x54/0x11c) from
> [<800e310c>] (swapin_readahead+0x64/0x9c)
> [  270.488892] [<800e310c>] (swapin_readahead+0x64/0x9c) from
> [<800d5acc>] (handle_pte_fault+0x2d8/0x668)
> [  270.498265] [<800d5acc>] (handle_pte_fault+0x2d8/0x668) from
> [<800d5f20>] (handle_mm_fault+0xc4/0xdc)
> [  270.507554] [<800d5f20>] (handle_mm_fault+0xc4/0xdc) from
> [<8001b080>] (do_page_fault+0x114/0x354)
> [  270.516580] [<8001b080>] (do_page_fault+0x114/0x354) from
> [<800083d8>] (do_DataAbort+0x44/0xa8)
> [  270.525346] [<800083d8>] (do_DataAbort+0x44/0xa8) from [<8000dc78>]
> (__dabt_usr+0x38/0x40)
>
> A lot of processes are in futex_wait(), probably for legitimate reasons:
>
> [  265.650220] VC manager      S 804e98d4     0  2662   1175 0x00200000
> [  265.656648] [<804e98d4>] (__schedule+0x66c/0x738) from [<804e9d2c>]
> (schedule+0x8c/0x90)
> [  265.664807] [<804e9d2c>] (schedule+0x8c/0x90) from [<8006f25c>]
> (futex_wait_queue_me+0xf0/0x110)
> [  265.673661] [<8006f25c>] (futex_wait_queue_me+0xf0/0x110) from
> [<8006fea8>] (futex_wait+0x110/0x254)
> [  265.682861] [<8006fea8>] (futex_wait+0x110/0x254) from [<80071440>]
> (do_futex+0xd4/0x97c)
> [  265.691107] [<80071440>] (do_futex+0xd4/0x97c) from [<80071e38>]
> (sys_futex+0x150/0x170)
> [  265.699266] [<80071e38>] (sys_futex+0x150/0x170) from [<8000e140>]
> (__sys_trace_return+0x0/0x20)
>
> A few processes are waiting on select() or other things.
>
> Can you see anything suspicious?
>
> Thanks!
> Luigi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>