On 7/15/19 12:32 PM, David Zarzycki wrote: > Hello, > > In the last few weeks, one of my build boxes started hanging at the end of a build with a zombie ld.lld process stuck in the kernel: > > [97199.634549] CPU: 14 PID: 72214 Comm: ld.lld Kdump: loaded Not tainted 5.2.0-1.fc31.x86_64 #1 > [97199.634550] Hardware name: Supermicro SYS-5038K-i-NF9/K1SPE, BIOS 1.0b 04/13/2017 > [97199.634551] RIP: 0010:compact_zone+0x4d0/0xce0 > [97199.634553] Code: 41 c6 47 78 01 e9 52 fc ff ff 4c 89 f7 48 89 ea 4c 89 e6 e8 22 8e 02 00 49 89 c6 e9 d7 fd ff ff 8b 4c 24 10 4c 89 e2 4c 89 ee <4c> 89 ff e8 e8 e0 ff ff 49 89 c4 48 85 c0 0f 84 bd fe ff ff 45 8b > [97199.634555] RSP: 0018:ffffac6a53c879c0 EFLAGS: 00000202 > [97199.634557] RAX: 0000000000000001 RBX: 000000000619f200 RCX: 000000000000000c > [97199.634558] RDX: 000000000619f000 RSI: 000000000619ee20 RDI: ffff95f77ffc8330 > [97199.634559] RBP: ffff95fb7ffd4d00 R08: 0000000000000007 R09: 000000000619f000 > [97199.634561] R10: 0000000000000000 R11: 0000000000000003 R12: 000000000619f000 > [97199.634562] R13: 000000000619ee20 R14: fffffb58467b8000 R15: ffffac6a53c87a90 > [97199.634563] FS: 00007ffff10fd700(0000) GS:ffff95f5fb780000(0000) knlGS:0000000000000000 > [97199.634566] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [97199.634567] CR2: 00007fff08001378 CR3: 00000054737f6000 CR4: 00000000001406e0 > [97199.634568] Call Trace: > [97199.634569] compact_zone_order+0xde/0x140 This was likely the same as https://bugzilla.kernel.org/show_bug.cgi?id=204165 Fixed by patch https://marc.info/?l=linux-mm&m=156344023621776&w=2 Now commit 670105a25608 ("mm: compaction: avoid 100% CPU usage during compaction when a task is killed") It should hit your distro kernel at some point. > [97199.634570] try_to_compact_pages+0xcc/0x2a0 > [97199.634570] __alloc_pages_direct_compact+0x8c/0x170 > [97199.634571] __alloc_pages_slowpath+0x248/0xdf0 > [97199.634572] ? get_vtime_delta+0x13/0xe0 > [97199.634573] ? finish_task_switch+0x12f/0x2a0 > [97199.634574] __alloc_pages_nodemask+0x2f2/0x340 > [97199.634575] do_huge_pmd_anonymous_page+0x130/0x910 > [97199.634576] __handle_mm_fault+0xfd7/0x1ac0 > [97199.634577] handle_mm_fault+0xc4/0x1f0 > [97199.634577] do_user_addr_fault+0x1f6/0x450 > [97199.634578] do_page_fault+0x33/0x120 > [97199.634579] ? page_fault+0x8/0x30 > [97199.634580] page_fault+0x1e/0x30 > > This bug seems to go away if I comment out the following lines from my boot script: > > # echo always > /sys/kernel/mm/transparent_hugepage/enabled > # echo always > /sys/kernel/mm/transparent_hugepage/defrag > > What can I do to debug this further? > > Dave >