On 11/21/2012 07:46 AM, Luigi Semenzato wrote:
Greetings MM folks, and thanks again for fixing my previous hang-with-zram problem. I am now running into a similar problem and I hope I will not take advantage of your kindness by asking for further advice. By running a few dozen memory-hungry processes on an ARM cpu with 2 Gb RAM, with zram enabled, I can easily get into a situation where all processes are either: 1. blocked in a futex 2. trying unsuccessfully to allocate memory This happens when there should still be plenty of memory: the zram swap device is about 1/3 full. (The output of SysRq-M is at the end.) Yet the SI and SO fields of vmstat stay at 0, and CPU utilization is 100% system. procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 46 0 1076432 13636 2844 216648 0 0 0 0 621 229 0 100 0 0 44 0 1076432 13636 2844 216648 0 0 0 0 618 204 0 100 0 0 I added counters in various places in the page allocator to see which paths were being taken and noticed the following facts: - alloc_page_slowpath is looping, apparently trying to rebalance. It calls alloc_pages_direct_reclaim at a rate of about 155 times/second, and gets one page about once every 500 calls. Did_some_progress is
You use which tool to get this data?
always set to true. Then should_alloc_retry returns true (because order < PAGE_ALLOC_COSTLY_ORDER). - kswapd is asleep and is not woken up because alloc_page_slowpath never goes to the "restart" label. My questions are: 1. is it obvious to any of you what is going wrong? 1.1 is the allocation failing because nobody is waking up kswapd? And if so, why not? 2. if it's not obvious, what are the next things to look into? 3. is there a better way of debugging this? Thanks! Luigi [ 0.000000] Linux version 3.4.0 (semenzato@xxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.6.x-google 20120301 (prerelease) (gcc-4.6.3_cos_gg_2a32ae6) ) #26 SMP Tue Nov 20 14:27:15 PST 2012 [ 0.000000] CPU: ARMv7 Processor [410fc0f4] revision 4 (ARMv7), cr=10c5387d [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache [ 0.000000] Machine: SAMSUNG EXYNOS5 (Flattened Device Tree), model: Google Snow ... [ 198.564328] SysRq : Show Memory [ 198.564347] Mem-info: [ 198.564355] Normal per-cpu: [ 198.564364] CPU 0: hi: 186, btch: 31 usd: 0 [ 198.564373] CPU 1: hi: 186, btch: 31 usd: 0 [ 198.564381] HighMem per-cpu: [ 198.564389] CPU 0: hi: 90, btch: 15 usd: 0 [ 198.564397] CPU 1: hi: 90, btch: 15 usd: 0 [ 198.564411] active_anon:196868 inactive_anon:66835 isolated_anon:47 [ 198.564415] active_file:13931 inactive_file:11043 isolated_file:0 [ 198.564419] unevictable:0 dirty:4 writeback:1 unstable:0 [ 198.564423] free:3409 slab_reclaimable:2583 slab_unreclaimable:3337 [ 198.564427] mapped:137910 shmem:29899 pagetables:3972 bounce:0 [ 198.564449] Normal free:13384kB min:5380kB low:6724kB high:8068kB active_anon:782052kB inactive_anon:261808kB active_file:25020kB inactive_file:24900kB unevictable:0kB isolated(anon):16kB isolated(file):0kB present:1811520kB mlocked:0kB dirty:12kB writeback:0kB mapped:461296kB shmem:115892kB slab_reclaimable:10332kB slab_unreclaimable:13348kB kernel_stack:3008kB pagetables:15888kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:107282320 all_unreclaimable? no [ 198.564474] lowmem_reserve[]: 0 2095 2095 [ 198.564499] HighMem free:252kB min:260kB low:456kB high:656kB active_anon:5420kB inactive_anon:5532kB active_file:30704kB inactive_file:19272kB unevictable:0kB isolated(anon):172kB isolated(file):0kB present:268224kB mlocked:0kB dirty:4kB writeback:4kB mapped:90344kB shmem:3704kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:7081406 all_unreclaimable? no [ 198.564523] lowmem_reserve[]: 0 0 0 [ 198.564536] Normal: 1570*4kB 6*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 1*4096kB = 13384kB [ 198.564574] HighMem: 59*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 252kB [ 198.564610] 123112 total pagecache pages [ 198.564616] 68239 pages in swap cache [ 198.564622] Swap cache stats: add 466115, delete 397876, find 31817/56350 [ 198.564630] Free swap = 1952336kB [ 198.564635] Total swap = 3028768kB [ 198.564640] xxcount_nr_reclaimed 358488 [ 198.564646] xxcount_nr_reclaims 6201 [ 198.564651] xxcount_aborted_reclaim 0 [ 198.564656] xxcount_more_to_do 5137 [ 198.564662] xxcount_direct_reclaims 17065 [ 198.564667] xxcount_failed_direct_reclaims 10708 [ 198.564673] xxcount_no_progress 5696 [ 198.564678] xxcount_restarts 5696 [ 198.564683] xxcount_should_alloc_retry 5008 [ 198.564688] xxcount_direct_compact 1 [ 198.564693] xxcount_alloc_failed 115 [ 198.564699] xxcount_gfp_nofail 0 [ 198.564704] xxcount_costly_order 5009 [ 198.564709] xxcount_repeat 0 [ 198.564714] xxcount_kswapd_nap 2210 [ 198.564719] xxcount_kswapd_sleep 17 [ 198.564724] xxcount_kswapd_loop 2211 [ 198.564729] xxcount_kswapd_try_to_sleep 2210 [ 198.575349] 524288 pages of RAM [ 198.575358] 4420 free pages [ 198.575365] 7122 reserved pages [ 198.575371] 4091 slab pages [ 198.575378] 302549 pages shared [ 198.575384] 68239 pages swap cached -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
-- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>