On 4/17/19 10:35 AM, Li Wang wrote: > Hi there, > > I catched this warning on v5.1-rc5(s390x). It was trggiered in fork & malloc & memset stress test, but the reproduced rate is very low. I'm working on find a stable reproducer for it. > > Anyone can have a look first? > > [ 1422.124060] WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_irect_compact+0x182/0x190 This means compaction was either skipped or deferred, yet it captured a page. We have some registers with value 1 and 2, which is COMPACT_SKIPPED and COMPACT_DEFERRED, so it could be one of those. Probably COMPACT_SKIPPED. I think a race is possible: - compact_zone_order() sets up current->capture_control - compact_zone() calls compaction_suitable() which returns COMPACT_SKIPPED, so it also returns - interrupt comes and its processing happens to free a page that forms high-order page, since 'current' isn't changed during interrupt (IIRC?) the capture_control is still active and the page is captured - compact_zone_order() does *capture = capc.page What do you think, Mel, does it look plausible? Not sure whether we want to try avoiding this scenario, or just remove the warning and be grateful for the successful capture :) > [ 1422.124065] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver > nfs lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390 des_s390 des_g > eneric sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc ip_tables xfs libcrc32c d > asd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod qeth qdio lcs ctcm ccwgroup fsm dm_mi > rror dm_region_hash dm_log dm_mod > [ 1422.124086] CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1 > [ 1422.124089] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0) > [ 1422.124092] Krnl PSW : 0704e00180000000 00000000002779ba (__alloc_pages_direct_compact+0x182/0x190) > [ 1422.124096] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI: 0 EA:3 > [ 1422.124100] Krnl GPRS: 0000000000000000 000003e00226fc24 000003d081bdf200 000 0000000000001 > [ 1422.124103] 000000000027789a 0000000000000000 0000000000000001 000 000000006ee03 > [ 1422.124107] 000003e00226fc28 0000000000000cc0 0000000000000240 000 0000000000002 > [ 1422.124156] 0000000000400000 0000000000753cb0 000000000027789a 000 003e00226fa28 > [ 1422.124163] Krnl Code: 00000000002779ac: e320f0a80002 ltg %r2,168( %r15) > [ 1422.124163] 00000000002779b2: a784fff4 brc 8,27799a > [ 1422.124163] #00000000002779b6: a7f40001 brc 15,2779b 8 > [ 1422.124163] >00000000002779ba: a7290000 lghi %r2,0 > [ 1422.124163] 00000000002779be: a7f4fff0 brc 15,27799 e > [ 1422.124163] 00000000002779c2: 0707 bcr 0,%r7 > [ 1422.124163] 00000000002779c4: 0707 bcr 0,%r7 > [ 1422.124163] 00000000002779c6: 0707 bcr 0,%r7 > [ 1422.124194] Call Trace: > [ 1422.124196] ([<000000000027789a>] __alloc_pages_direct_compact+0x62/0x190) > [ 1422.124198] [<0000000000278618>] __alloc_pages_nodemask+0x728/0x1148 > [ 1422.124201] [<0000000000126bb2>] crst_table_alloc+0x32/0x68 > [ 1422.124203] [<0000000000135888>] mm_init+0x118/0x308 > [ 1422.124204] [<0000000000137e60>] copy_process.part.49+0x1820/0x1d90 > [ 1422.124205] [<000000000013865c>] _do_fork+0x114/0x3b8 > [ 1422.124206] [<0000000000138aa4>] __s390x_sys_clone+0x44/0x58 > [ 1422.124210] [<0000000000739a90>] system_call+0x288/0x2a8 > [ 1422.124210] Last Breaking-Event-Address: > [ 1422.124212] [<00000000002779b6>] __alloc_pages_direct_compact+0x17e/0x190 > [ 1422.124213] ---[ end trace 36649eaa36968eaa ]--- > > -- > Regards, > Li Wang