Johannes Weiner wrote on 29.11.2012 18:05: > On Thu, Nov 29, 2012 at 04:30:12PM +0100, Thorsten Leemhuis wrote: >> Mel Gorman wrote on 29.11.2012 00:54: >> > On Wed, Nov 28, 2012 at 02:52:15PM -0800, Andrew Morton wrote: >> >> On Wed, 28 Nov 2012 10:13:59 +0000 >> >> Mel Gorman <mgorman@xxxxxxx> wrote: >> >> > Based on the reports I've seen I expect the following to work for 3.7 >> >> > Keep >> >> > 96710098 mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" >> >> > ef6c5be6 fix incorrect NR_FREE_PAGES accounting (appears like memory leak) >> >> > Revert >> >> > 82b212f4 Revert "mm: remove __GFP_NO_KSWAPD" >> >> > Merge >> >> > mm: vmscan: fix kswapd endless loop on higher order allocation >> >> > mm: Avoid waking kswapd for THP allocations when compaction is deferred or contended >> >> "mm: Avoid waking kswapd for THP ..." is marked "I have not tested it >> >> myself" and when Zdenek tested it he hit an unexplained oom. >> > I thought Zdenek was testing with __GFP_NO_KSWAPD when he hit that OOM. >> > Further, when he hit that OOM, it looked like a genuine OOM. He had no >> > swap configured and inactive/active file pages were very low. Finally, >> > the free pages for Normal looked off and could also have been affected by >> > the accounting bug. I'm looking at https://lkml.org/lkml/2012/11/18/132 >> > here. Are you thinking of something else? >> > I have not tested with the patch admittedly but Thorsten has and seemed >> > to be ok with it https://lkml.org/lkml/2012/11/23/276. >> Yeah, on my two main work horses a few different kernels based on rc6 or >> rc7 worked fine with this patch. But sorry, it seems the patch doesn't >> fix the problems Fedora user John Ellson sees, who tried kernels I built >> in the Fedora buildsystem. Details: > [...] >> I know, this makes things more complicated again; but I wanted to let >> you guys know that some problem might still be lurking somewhere. Side >> note: right now it seems John with kernels that contain >> "Avoid-waking-kswapd-for-THP-allocations-when" can trigger the problem >> quicker (or only?) on i686 than on x86-64. > > Humm, highmem... Could this be the lowmem protection forcing kswapd > to reclaim highmem at DEF_PRIORITY (not useful but burns CPU) every > time it's woken up? > > This requires somebody to wake up kswapd regularly, though and from > his report it's not quite clear to me if kswapd gets stuck or just has > really high CPU usage while the system is still under load. The > initial post says he would expect "<5% cpu when idling" but his top > snippet in there shows there are other tasks running as well. So does > it happen while the system is busy or when it's otherwise idle? > > [ On the other hand, not waking kswapd from THP allocations seems to > not show this problem on his i686 machine. But it could also just > be a tiny window of conditions aligning perfectly that drops kswapd > in an endless loop, and the increased wakeups increase the > probability of hitting it. So, yeah, this would be good to know. ] > > As the system is still responsive when this happens, any chance he > could capture /proc/zoneinfo and /proc/vmstat when kswapd goes > haywire? > > Or even run perf record -a -g sleep 5; perf report > kswapd.txt? > > Preferrably with this patch applied, to rule out faulty lowmem > protection: > > buffer_heads_over_limit can put kswapd into reclaim, but it's ignored > when figuring out whether the zone is balanced and so priority levels > are not descended and no progress is ever made. /me wonders how to elegantly get out of his man-in-the-middle position John was able to reproduce the problem quickly with a kernel that contained the patch from your mail. For details see https://bugzilla.redhat.com/show_bug.cgi?id=866988#c42 and later He provided the informations there. Parts of it: /proc/vmstat while kswad0 at 100%cpu nr_free_pages 196858 nr_inactive_anon 15804 nr_active_anon 65 nr_inactive_file 20792 nr_active_file 11307 nr_unevictable 0 nr_mlock 0 nr_anon_pages 14385 nr_mapped 2393 nr_file_pages 32563 nr_dirty 5 nr_writeback 0 nr_slab_reclaimable 3113 nr_slab_unreclaimable 4725 nr_page_table_pages 271 nr_kernel_stack 96 nr_unstable 0 nr_bounce 0 nr_vmscan_write 1487 nr_vmscan_immediate_reclaim 3 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 381 nr_dirtied 388323 nr_written 361128 nr_anon_transparent_hugepages 1 nr_free_cma 0 nr_dirty_threshold 38188 nr_dirty_background_threshold 19094 pgpgin 1057223 pgpgout 1552306 pswpin 8 pswpout 1487 pgalloc_dma 5548 pgalloc_normal 10651864 pgalloc_high 2191246 pgalloc_movable 0 pgfree 13055503 pgactivate 440358 pgdeactivate 259724 pgfault 31423675 pgmajfault 3760 pgrefill_dma 2174 pgrefill_normal 212914 pgrefill_high 51755 pgrefill_movable 0 pgsteal_kswapd_dma 1 pgsteal_kswapd_normal 202106 pgsteal_kswapd_high 36515 pgsteal_kswapd_movable 0 pgsteal_direct_dma 18 pgsteal_direct_normal 0 pgsteal_direct_high 3818 pgsteal_direct_movable 0 pgscan_kswapd_dma 1 pgscan_kswapd_normal 203044 pgscan_kswapd_high 40407 pgscan_kswapd_movable 0 pgscan_direct_dma 18 pgscan_direct_normal 0 pgscan_direct_high 4409 pgscan_direct_movable 0 pgscan_direct_throttle 0 pginodesteal 0 slabs_scanned 264192 kswapd_inodesteal 171676 kswapd_low_wmark_hit_quickly 0 kswapd_high_wmark_hit_quickly 26 kswapd_skip_congestion_wait 0 pageoutrun 117729182 allocstall 5 pgrotated 1628 compact_blocks_moved 313 compact_pages_moved 7192 compact_pagemigrate_failed 265 compact_stall 13 compact_fail 9 compact_success 4 htlb_buddy_alloc_success 0 htlb_buddy_alloc_fail 0 unevictable_pgs_culled 2985 unevictable_pgs_scanned 0 unevictable_pgs_rescued 1877 unevictable_pgs_mlocked 3965 unevictable_pgs_munlocked 3965 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 thp_fault_alloc 636 thp_fault_fallback 10 thp_collapse_alloc 342 thp_collapse_alloc_failed 2 thp_split 6 /proc/zoneinfo with kswapd0 at 100% cpu Node 0, zone DMA pages free 1655 min 196 low 245 high 294 scanned 0 spanned 4080 present 3951 nr_free_pages 1655 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 3 nr_slab_unreclaimable 1 nr_page_table_pages 0 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 315 nr_written 315 nr_anon_transparent_hugepages 0 nr_free_cma 0 protection: (0, 861, 1000, 1000) pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 2 all_unreclaimable: 1 start_pfn: 16 inactive_ratio: 1 Node 0, zone Normal pages free 186234 min 10953 low 13691 high 16429 scanned 0 spanned 222206 present 220470 nr_free_pages 186234 nr_inactive_anon 3147 nr_active_anon 2 nr_inactive_file 14064 nr_active_file 4672 nr_unevictable 0 nr_mlock 0 nr_anon_pages 3028 nr_mapped 216 nr_file_pages 18857 nr_dirty 8 nr_writeback 0 nr_slab_reclaimable 3110 nr_slab_unreclaimable 4729 nr_page_table_pages 62 nr_kernel_stack 96 nr_unstable 0 nr_bounce 0 nr_vmscan_write 311 nr_vmscan_immediate_reclaim 2 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 114 nr_dirtied 339809 nr_written 315061 nr_anon_transparent_hugepages 0 nr_free_cma 0 protection: (0, 0, 1111, 1111) pagesets cpu: 0 count: 81 high: 186 batch: 31 vm stats threshold: 8 all_unreclaimable: 0 start_pfn: 4096 inactive_ratio: 1 Node 0, zone HighMem pages free 8983 min 34 low 475 high 917 scanned 0 spanned 35840 present 35560 nr_free_pages 8983 nr_inactive_anon 12661 nr_active_anon 64 nr_inactive_file 6849 nr_active_file 6500 nr_unevictable 0 nr_mlock 0 nr_anon_pages 11357 nr_mapped 2177 nr_file_pages 13692 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_page_table_pages 209 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 1176 nr_vmscan_immediate_reclaim 1 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 267 nr_dirtied 48189 nr_written 45739 nr_anon_transparent_hugepages 1 nr_free_cma 0 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 20 high: 42 batch: 7 vm stats threshold: 4 all_unreclaimable: 0 start_pfn: 226302 inactive_ratio: 1 First few lines of /proc/vmstat while kswad0 at 100%cpu # ======== # captured on: Fri Nov 30 07:22:00 2012 # hostname : rawhide # os release : 3.7.0-0.rc7.git1.2.van.main.knurd.kswap.3.fc18.i686 # perf version : 3.7.0-0.rc7.git1.2.van.main.knurd.kswap.3.fc18.i686 # arch : i686 # nrcpus online : 1 # nrcpus avail : 1 # cpudesc : QEMU Virtual CPU version 1.0.1 # cpuid : AuthenticAMD,6,2,3 # total memory : 1027716 kB # cmdline : /usr/bin/perf record -g -a sleep 5 # event : name = cpu-clock, type = 1, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, id = { 7 } # HEADER_CPU_TOPOLOGY info available, use -I to display # pmu mappings: software = 1, tracepoint = 2, breakpoint = 5 # ======== # # Samples: 20K of event 'cpu-clock' # Event count (approx.): 20016 # # Overhead Command Shared Object Symbol # ........ ........... ......................... ................................... # 16.52% kswapd0 [kernel.kallsyms] [k] idr_get_next | --- idr_get_next | |--99.76%-- css_get_next | mem_cgroup_iter | | | |--50.49%-- shrink_zone | | kswapd | | kthread | | ret_from_kernel_thread | | | --49.51%-- kswapd | kthread | ret_from_kernel_thread --0.24%-- [...] 11.23% kswapd0 [kernel.kallsyms] [k] prune_super | --- prune_super | |--86.74%-- shrink_slab | kswapd | kthread | ret_from_kernel_thread | --13.26%-- kswapd kthread ret_from_kernel_thread 10.73% kswapd0 [kernel.kallsyms] [k] shrink_slab | --- shrink_slab | |--99.58%-- kswapd | kthread | ret_from_kernel_thread --0.42%-- [...] 7.36% kswapd0 [kernel.kallsyms] [k] grab_super_passive | --- grab_super_passive | |--92.46%-- prune_super | shrink_slab | kswapd | kthread | ret_from_kernel_thread | --7.54%-- shrink_slab kswapd kthread ret_from_kernel_thread 5.82% kswapd0 [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--34.28%-- put_super | drop_super | prune_super | shrink_slab | kswapd | kthread | ret_from_kernel_thread | |--30.50%-- grab_super_passive | prune_super | shrink_slab | kswapd | kthread | ret_from_kernel_thread | |--17.27%-- prune_super | shrink_slab | kswapd | kthread | ret_from_kernel_thread | |--16.15%-- drop_super | prune_super | shrink_slab | kswapd | kthread | ret_from_kernel_thread | |--1.20%-- mb_cache_shrink_fn | shrink_slab | kswapd | kthread | ret_from_kernel_thread | --0.60%-- shrink_slab kswapd kthread ret_from_kernel_thread 4.43% kswapd0 [kernel.kallsyms] [k] fill_contig_page_info | --- fill_contig_page_info | |--99.10%-- fragmentation_index | compaction_suitable | kswapd | kthread | ret_from_kernel_thread | --0.90%-- compaction_suitable kswapd kthread ret_from_kernel_thread 3.81% kswapd0 [kernel.kallsyms] [k] shrink_lruvec | --- shrink_lruvec | |--99.34%-- shrink_zone | kswapd | kthread | ret_from_kernel_thread | --0.66%-- kswapd kthread ret_from_kernel_thread The rest at https://bugzilla.redhat.com/attachment.cgi?id=654977 CU Thorsten -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>