I have better results now from using zram on ARM. For those who followed my previous thread (zram OOM behavior), this is a different problem, not connected with OOM. (At least i don't think so.) I am running a large instance of the Chrome browser on an ARM platform with 2 GB of RAM. I create a zram swap device with 3 GB. On x86, we have measured a compression ratio of about 3:1, so this leaves roughly half the RAM for compressed swap use. I am running kernel 3.4 on these platforms. To be able to run on ARM, I applied a recent patch which removes the x86 dependency from zsmalloc. This identical setup works fine on x86. On ARM, the system starts swapping to RAM (at about 20MB/second), but when it still has between 1 to 2 GB of swap space available (zram device about 1/2 full), it stops swapping (si = so = 0 from "vmstat 1"), and most processes stop responding. As in my previous situation, some processes keep running, and they appear to be those that don't try to allocate memory. I can rely on SysRQ and preserved memory in this situation, but the buffer size is 128k and not large enough for a full dump of all stacks. I am attaching the (truncated) log for this case. Many processes are waiting for memory on a page fault, for instance these: [ 273.434964] chrome R running 0 4279 1175 0x00200000 [ 273.441393] [<804e98d4>] (__schedule+0x66c/0x738) from [<804e9d2c>] (schedule+0x8c/0x90) [ 273.449551] [<804e9d2c>] (schedule+0x8c/0x90) from [<804e7ef0>] (schedule_timeout+0x278/0x2d4) [ 273.458232] [<804e7ef0>] (schedule_timeout+0x278/0x2d4) from [<804e7f7c>] (schedule_timeout_uninterruptible+0x30/0x34) [ 273.468995] [<804e7f7c>] (schedule_timeout_uninterruptible+0x30/0x34) from [<800bb898>] (__alloc_pages_nodemask+0x5d4/0x7a8) [ 273.480280] [<800bb898>] (__alloc_pages_nodemask+0x5d4/0x7a8) from [<800e2fe0>] (read_swap_cache_async+0x54/0x11c) [ 273.490695] [<800e2fe0>] (read_swap_cache_async+0x54/0x11c) from [<800e310c>] (swapin_readahead+0x64/0x9c) [ 273.500418] [<800e310c>] (swapin_readahead+0x64/0x9c) from [<800d5acc>] (handle_pte_fault+0x2d8/0x668) [ 273.509791] [<800d5acc>] (handle_pte_fault+0x2d8/0x668) from [<800d5f20>] (handle_mm_fault+0xc4/0xdc) [ 273.519079] [<800d5f20>] (handle_mm_fault+0xc4/0xdc) from [<8001b080>] (do_page_fault+0x114/0x354) [ 273.528105] [<8001b080>] (do_page_fault+0x114/0x354) from [<800083d8>] (do_DataAbort+0x44/0xa8) [ 273.536871] [<800083d8>] (do_DataAbort+0x44/0xa8) from [<8000dc78>] (__dabt_usr+0x38/0x40) [ 270.435243] Chrome_ChildIOT R running 0 3166 1175 0x00200000 [ 270.441673] [<804e98d4>] (__schedule+0x66c/0x738) from [<8005696c>] (__cond_resched+0x30/0x40) [ 270.450352] [<8005696c>] (__cond_resched+0x30/0x40) from [<804e9a44>] (_cond_resched+0x40/0x50) [ 270.459118] [<804e9a44>] (_cond_resched+0x40/0x50) from [<800bb798>] (__alloc_pages_nodemask+0x4d4/0x7a8) [ 270.468755] [<800bb798>] (__alloc_pages_nodemask+0x4d4/0x7a8) from [<800e2fe0>] (read_swap_cache_async+0x54/0x11c) [ 270.479170] [<800e2fe0>] (read_swap_cache_async+0x54/0x11c) from [<800e310c>] (swapin_readahead+0x64/0x9c) [ 270.488892] [<800e310c>] (swapin_readahead+0x64/0x9c) from [<800d5acc>] (handle_pte_fault+0x2d8/0x668) [ 270.498265] [<800d5acc>] (handle_pte_fault+0x2d8/0x668) from [<800d5f20>] (handle_mm_fault+0xc4/0xdc) [ 270.507554] [<800d5f20>] (handle_mm_fault+0xc4/0xdc) from [<8001b080>] (do_page_fault+0x114/0x354) [ 270.516580] [<8001b080>] (do_page_fault+0x114/0x354) from [<800083d8>] (do_DataAbort+0x44/0xa8) [ 270.525346] [<800083d8>] (do_DataAbort+0x44/0xa8) from [<8000dc78>] (__dabt_usr+0x38/0x40) A lot of processes are in futex_wait(), probably for legitimate reasons: [ 265.650220] VC manager S 804e98d4 0 2662 1175 0x00200000 [ 265.656648] [<804e98d4>] (__schedule+0x66c/0x738) from [<804e9d2c>] (schedule+0x8c/0x90) [ 265.664807] [<804e9d2c>] (schedule+0x8c/0x90) from [<8006f25c>] (futex_wait_queue_me+0xf0/0x110) [ 265.673661] [<8006f25c>] (futex_wait_queue_me+0xf0/0x110) from [<8006fea8>] (futex_wait+0x110/0x254) [ 265.682861] [<8006fea8>] (futex_wait+0x110/0x254) from [<80071440>] (do_futex+0xd4/0x97c) [ 265.691107] [<80071440>] (do_futex+0xd4/0x97c) from [<80071e38>] (sys_futex+0x150/0x170) [ 265.699266] [<80071e38>] (sys_futex+0x150/0x170) from [<8000e140>] (__sys_trace_return+0x0/0x20) A few processes are waiting on select() or other things. Can you see anything suspicious? Thanks! Luigi
Attachment:
console-ramoops64
Description: Binary data