Re: [Bug 204165] New: 100% CPU usage in compact_zone_order

howaboutsynergy@xxxxxxxxxxxxxx · Tue, 16 Jul 2019 19:15:08 +0000

On Tuesday, July 16, 2019 12:03 PM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> I tried reproducing this but after 300 attempts with various parameters
> and adding other workloads in the background, I was unable to reproduce
> the problem.
> 

The third time I ran this command `$ time stress -m 220 --vm-bytes 10000000000 --timeout 10`, got 10+ hung:

  PID  %CPU COMMAND                                                                            PR  NI    VIRT    RES S USER     

 3785  94.5 stress                                                                             20   0 9769416      4 R user     

 3777  87.3 stress                                                                             20   0 9769416      4 R user     

 3923  85.5 stress                                                                             20   0 9769416      4 R user     

 3937  85.5 stress                                                                             20   0 9769416      4 R user     

 3943  81.8 stress                                                                             20   0 9769416      4 R user     

 3885  80.0 stress                                                                             20   0 9769416      4 R user     

 3970  80.0 stress                                                                             20   0 9769416      4 R user     

 3902  76.4 stress                                                                             20   0 9769416      4 R user     

 3954  72.7 stress                                                                             20   0 9769416      4 R user     

 3868  70.9 stress                                                                             20   0 9769416      4 R user     

 3893  69.1 stress                                                                             20   0 9769416      4 R user     

 3786  65.5 stress                                                                             20   0 9769416      4 R user     

 3783  60.0 stress                                                                             20   0 9769416      4 R user     

 3848  58.2 stress                                                                             20   0 9769416      4 R user     

 3863  58.2 stress                                                                             20   0 9769416      4 R user     

looked like this:
```
TERM='xterm-256color'
53.48 573.49
-----------
user@i87k 2019/07/16 20:36:47 -bash5.0.7 t:5 j:0 d:3 pp:1140 p:1623 ut53
!41866 1 0  5.2.1-g527a3db363a3 #71 SMP Tue Jul 16 19:41:12 CEST 2019
/home/user 

$ time stress -m 220 --vm-bytes 10000000000 --timeout 10
stress: info: [1744] dispatching hogs: 0 cpu, 0 io, 220 vm, 0 hdd
stress: info: [1744] successful run completed in 19s

real	0m19.036s
user	0m0.794s
sys	2m59.583s
-----------
user@i87k 2019/07/16 20:37:12 -bash5.0.7 t:5 j:0 d:3 pp:1140 p:1623 ut79
!41867 2 0  5.2.1-g527a3db363a3 #71 SMP Tue Jul 16 19:41:12 CEST 2019
/home/user 

$ time stress -m 220 --vm-bytes 10000000000 --timeout 10
stress: info: [3520] dispatching hogs: 0 cpu, 0 io, 220 vm, 0 hdd
stress: info: [3520] successful run completed in 18s

real	0m18.657s
user	0m0.901s
sys	2m59.700s
-----------
user@i87k 2019/07/16 20:42:28 -bash5.0.7 t:5 j:0 d:3 pp:1140 p:1623 ut394
!41868 3 0  5.2.1-g527a3db363a3 #71 SMP Tue Jul 16 19:41:12 CEST 2019
/home/user 

$ time stress -m 220 --vm-bytes 10000000000 --timeout 10
stress: info: [3771] dispatching hogs: 0 cpu, 0 io, 220 vm, 0 hdd

```
(sure I waited a few minutes until I ran the 3rd one, I don't remember what trivial things I've been doing during the time - like making sure I got the right trace command(s))

I'm pretty sure you need swap in (ext4)zram for this to trigger (faster?), judging by the fuller stacktrace that I showed in prev. email(s) gotten via 'crash'-s `bt -Tsx pidhere`

On Tuesday, July 16, 2019 9:11 AM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> High CPU usage in this path is not something I've observed recently.
> When it happens and CPU usage is high, can you run the following commands
> please?
> 

> trace-cmd record -e compaction:* sleep 10
> trace-cmd report > trace.log
> 

> and send me the resulting trace.log please?

Ok, getting trace.log as requested:

```
$ sudo trace-cmd record -e compaction:* sleep 10
[sudo] password for user: 

CPU 0: 12430 events lost
CPU 1: 83959 events lost
CPU 4: 13447 events lost
CPU 6: 2825 events lost
CPU 8: 791 events lost
CPU 11: 8475 events lost
CPU0 data recorded at offset=0x5bc000
    114487296 bytes in size
CPU1 data recorded at offset=0x72eb000
    106885120 bytes in size
CPU2 data recorded at offset=0xd8da000
    125046784 bytes in size
CPU3 data recorded at offset=0x1501b000
    111022080 bytes in size
CPU4 data recorded at offset=0x1b9fc000
    120532992 bytes in size
CPU5 data recorded at offset=0x22cef000
    115990528 bytes in size
CPU6 data recorded at offset=0x29b8d000
    116109312 bytes in size
CPU7 data recorded at offset=0x30a48000
    73822208 bytes in size
CPU8 data recorded at offset=0x350af000
    98643968 bytes in size
CPU9 data recorded at offset=0x3aec2000
    96514048 bytes in size
CPU10 data recorded at offset=0x40acd000
    113967104 bytes in size
CPU11 data recorded at offset=0x4777d000
    127184896 bytes in size

trace.dat is 1.3G
-rw-r--r--  1 root root 1326219264 16.07.2019 20:45 trace.dat

$ LD_PRELOAD=/usr/lib/trace-cmd/python/ctracecmd.so trace-cmd report > trace.log
trace-cmd: symbol lookup error: /usr/lib/trace-cmd/python/ctracecmd.so: undefined symbol: PyExc_SystemError

$ trace-cmd report > trace.log
  could not load plugin '/usr/lib/trace-cmd/plugins/plugin_python.so'
/usr/lib/trace-cmd/plugins/plugin_python.so: undefined symbol: PyString_FromString

I guess we could ignore that?

trace.log is like 4.3G
-rw-r--r--  1 user user 4370520245 16.07.2019 20:50 trace.log

On Tuesday, July 16, 2019 12:03 PM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> I tried reproducing this but after 300 attempts with various parameters
> and adding other workloads in the background, I was unable to reproduce
> the problem.
> 

As a reminder, here's how sysrq+l stacktraces look like (for two of the stress pids):
```
[ 1294.913508] NMI backtrace for cpu 5
[ 1294.913517] CPU: 5 PID: 3848 Comm: stress Kdump: loaded Tainted: G     U            5.2.1-g527a3db363a3 #71
[ 1294.913522] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2201 05/27/2019
[ 1294.913526] RIP: 0010:ftrace_likely_update+0x1a/0x200
[ 1294.913533] Code: 0b eb bb 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 57 41 56 41 55 41 54 55 53 48 83 ec 20 48 89 fb 41 89 d4 9c 41 5f 0f 01 ca <85> c9 0f 84 8b 00 00 00 48 ff 47 28 8b 15 34 72 47 01 85 d2 75 16
[ 1294.913537] RSP: 0000:ffffa3cd4c7ef848 EFLAGS: 00000286
[ 1294.913552] RAX: 0000000000000000 RBX: ffffffff97546180 RCX: 0000000000000000
[ 1294.913557] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff97546180
[ 1294.913562] RBP: 000000000017cc6f R08: 0000000000000000 R09: 0000000000000000
[ 1294.913567] R10: 0000000000000001 R11: 0000000000000004 R12: 0000000000000000
[ 1294.913571] R13: ffff93560dfdc000 R14: 000000000012f44c R15: 0000000000000286
[ 1294.913577] FS:  000072d6afdff740(0000) GS:ffff9355ed880000(0000) knlGS:0000000000000000
[ 1294.913582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1294.913590] CR2: 000070fbe8abdc88 CR3: 0000000827a46004 CR4: 00000000003606e0
[ 1294.913595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1294.913600] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1294.913605] Call Trace:
[ 1294.913609]  _cond_resched+0x2d/0x50
[ 1294.913613]  isolate_migratepages_block+0xf8/0xe40
[ 1294.913618]  compact_zone+0x4f9/0xe10
[ 1294.913626]  compact_zone_order+0xe3/0x120
[ 1294.913630]  try_to_compact_pages+0xde/0x3b0
[ 1294.913635]  __alloc_pages_direct_compact+0x8c/0x170
[ 1294.913639]  __alloc_pages_slowpath+0x65c/0x1290
[ 1294.913644]  __alloc_pages_nodemask+0x4cf/0x530
[ 1294.913648]  do_huge_pmd_anonymous_page+0x17c/0x780
[ 1294.913653]  __handle_mm_fault+0xeee/0x17d0
[ 1294.913661]  handle_mm_fault+0x17b/0x330
[ 1294.913666]  __do_page_fault+0x34e/0x800
[ 1294.913671]  do_page_fault+0x57/0x1f9
[ 1294.913675]  ? page_fault+0x8/0x30
[ 1294.913680]  page_fault+0x1e/0x30
[ 1294.913684] RIP: 0033:0x5a46e611ac10
[ 1294.913690] Code: c0 0f 84 53 02 00 00 8b 54 24 0c 31 c0 85 d2 0f 94 c0 89 04 24 41 83 fd 02 0f 8f fa 00 00 00 31 c0 4d 85 ff 7e 10 0f 1f 40 00 <c6> 04 03 5a 4c 01 f0 49 39 c7 7f f4 4d 85 e4 0f 84 f4 01 00 00 7e
[ 1294.913695] RSP: 002b:00007fffa51f0f60 EFLAGS: 00010206
[ 1294.913705] RAX: 00000000098c0000 RBX: 000072d45bd40010 RCX: 000072d6aff243db
[ 1294.913710] RDX: 0000000000000000 RSI: 00000002540bf000 RDI: 000072d45bd40000
[ 1294.913715] RBP: 00005a46e611ba54 R08: 000072d45bd40010 R09: 0000000000000000
[ 1294.913720] R10: 0000000000000022 R11: 00000002540be400 R12: ffffffffffffffff
[ 1294.913724] R13: 0000000000000002 R14: 0000000000001000 R15: 00000002540be400
[ 1294.913730] NMI backtrace for cpu 11
[ 1294.913739] CPU: 11 PID: 3902 Comm: stress Kdump: loaded Tainted: G     U            5.2.1-g527a3db363a3 #71
[ 1294.913744] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2201 05/27/2019
[ 1294.913748] RIP: 0010:isolate_migratepages_block+0x97b/0xe40
[ 1294.913754] Code: c6 43 79 01 4c 8b 74 24 08 4d 39 fe b9 00 00 00 00 ba 00 00 00 00 40 0f 92 c6 40 0f b6 f6 48 c7 c7 90 ac 56 97 e8 05 fe f5 ff <4d> 39 fe 0f 82 7a f9 ff ff 4c 39 7c 24 08 0f 84 6f f9 ff ff 0f 1f
[ 1294.913758] RSP: 0000:ffffa3cd4c99f8b0 EFLAGS: 00000286
[ 1294.913768] RAX: 0000000000000000 RBX: ffffa3cd4c99fa50 RCX: 0000000000000000
[ 1294.913777] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9756ac90
[ 1294.913781] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[ 1294.913786] R10: 0000000000000001 R11: 0000000000000004 R12: 0000000000000000
[ 1294.913790] R13: 0000000000000000 R14: 000000000070ee00 R15: 000000000070ece0
[ 1294.913795] FS:  000072d6afdff740(0000) GS:ffff9355edb80000(0000) knlGS:0000000000000000
[ 1294.913799] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1294.913804] CR2: 00007d99c6200000 CR3: 00000007ecb1a003 CR4: 00000000003606e0
[ 1294.913813] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1294.913817] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1294.913822] Call Trace:
[ 1294.913826]  compact_zone+0x4f9/0xe10
[ 1294.913830]  compact_zone_order+0xe3/0x120
[ 1294.913835]  try_to_compact_pages+0xde/0x3b0
[ 1294.913839]  __alloc_pages_direct_compact+0x8c/0x170
[ 1294.913848]  __alloc_pages_slowpath+0x65c/0x1290
[ 1294.913852]  __alloc_pages_nodemask+0x4cf/0x530
[ 1294.913856]  do_huge_pmd_anonymous_page+0x17c/0x780
[ 1294.913861]  __handle_mm_fault+0xeee/0x17d0
[ 1294.913865]  handle_mm_fault+0x17b/0x330
[ 1294.913869]  __do_page_fault+0x34e/0x800
[ 1294.913874]  do_page_fault+0x57/0x1f9
[ 1294.913882]  ? page_fault+0x8/0x30
[ 1294.913886]  page_fault+0x1e/0x30
[ 1294.913891] RIP: 0033:0x5a46e611ac10
[ 1294.913896] Code: c0 0f 84 53 02 00 00 8b 54 24 0c 31 c0 85 d2 0f 94 c0 89 04 24 41 83 fd 02 0f 8f fa 00 00 00 31 c0 4d 85 ff 7e 10 0f 1f 40 00 <c6> 04 03 5a 4c 01 f0 49 39 c7 7f f4 4d 85 e4 0f 84 f4 01 00 00 7e
[ 1294.913901] RSP: 002b:00007fffa51f0f60 EFLAGS: 00010206
[ 1294.913907] RAX: 000000000d0c0000 RBX: 000072d45bd40010 RCX: 000072d6aff243db
[ 1294.913916] RDX: 0000000000000000 RSI: 00000002540bf000 RDI: 000072d45bd40000
[ 1294.913921] RBP: 00005a46e611ba54 R08: 000072d45bd40010 R09: 0000000000000000
[ 1294.913925] R10: 0000000000000022 R11: 00000002540be400 R12: ffffffffffffffff
[ 1294.913929] R13: 0000000000000002 R14: 0000000000001000 R15: 00000002540be400
```

and here's how they look via crash:
```
$ sudo crash "/usr/lib/modules/$(uname -r)/build/vmlinux"
[sudo] password for user: 

crash 7.2.6
Copyright (C) 2002-2019  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [336MB]: patching 121800 gdb minimal_symbol values

      KERNEL: /usr/lib/modules/5.2.1-g527a3db363a3/build/vmlinux       

    DUMPFILE: /proc/kcore
        CPUS: 12
        DATE: Tue Jul 16 20:59:48 2019
      UPTIME: 00:23:54
LOAD AVERAGE: 26.83, 21.74, 16.35
       TASKS: 508
    NODENAME: i87k
     RELEASE: 5.2.1-g527a3db363a3
     VERSION: #71 SMP Tue Jul 16 19:41:12 CEST 2019
     MACHINE: x86_64  (3700 Mhz)
      MEMORY: 31.9 GB
         PID: 6417
     COMMAND: "crash"
        TASK: ffff934f9c9a8000  [THREAD_INFO: ffff934f9c9a8000]
         CPU: 2
       STATE: TASK_RUNNING (ACTIVE)

crash> bt -Tsx 3848
PID: 3848   TASK: ffff934f83373d80  CPU: 5   COMMAND: "stress"
  [ffffa3cd4c7eefb8] get_page_from_freelist+0xaa9 at ffffffff96291799
  [ffffa3cd4c7ef0e8] get_page_from_freelist+0xaa9 at ffffffff96291799
  [ffffa3cd4c7ef108] __alloc_pages_slowpath+0x216 at ffffffff96292cb6
  [ffffa3cd4c7ef130] get_page_from_freelist+0xb76 at ffffffff96291866
  [ffffa3cd4c7ef178] get_page_from_freelist+0xaa9 at ffffffff96291799
  [ffffa3cd4c7ef1a0] trace_hardirqs_on_caller+0x32 at ffffffff961bc4b2
  [ffffa3cd4c7ef2e0] ZSTD_compressSequences_internal+0x8db at ffffffff9660a27b
  [ffffa3cd4c7ef388] ZSTD_compressSequences_internal+0x8db at ffffffff9660a27b
  [ffffa3cd4c7ef4b0] get_zspage_mapping+0x40 at ffffffff962e2220
  [ffffa3cd4c7ef538] decay_load+0x6a at ffffffff96111d1a
  [ffffa3cd4c7ef5a0] __list_add_valid+0x90 at ffffffff965e3ff0
  [ffffa3cd4c7ef650] check_preempt_wakeup+0x267 at ffffffff960f7097
  [ffffa3cd4c7ef6b8] decay_load+0x6a at ffffffff96111d1a
  [ffffa3cd4c7ef700] trace_hardirqs_on_thunk+0x1a at ffffffff96001b02
  [ffffa3cd4c7ef738] update_load_avg+0xca at ffffffff960f73aa
  [ffffa3cd4c7ef740] update_load_avg+0xca at ffffffff960f73aa
  [ffffa3cd4c7ef7a0] finish_task_switch+0xc9 at ffffffff960e6879
  [ffffa3cd4c7ef7e8] finish_task_switch+0x17f at ffffffff960e692f
  [ffffa3cd4c7ef820] __schedule+0x552 at ffffffff969f4752
  [ffffa3cd4c7ef8a8] isolate_migratepages_block+0x97b at ffffffff9626165b
  [ffffa3cd4c7ef918] isolate_migratepages_block+0xf at ffffffff96260cef
  [ffffa3cd4c7ef978] compact_zone+0x4f9 at ffffffff96263eb9
  [ffffa3cd4c7efa38] compact_zone_order+0xe3 at ffffffff962648b3
  [ffffa3cd4c7efaf0] try_to_compact_pages+0xde at ffffffff9626523e
  [ffffa3cd4c7efb60] __alloc_pages_direct_compact+0x8c at ffffffff9629285c
  [ffffa3cd4c7efbb8] __alloc_pages_slowpath+0x65c at ffffffff962930fc
  [ffffa3cd4c7efce8] __alloc_pages_nodemask+0x4cf at ffffffff962941ff
  [ffffa3cd4c7efd60] do_huge_pmd_anonymous_page+0x17c at ffffffff962c6fcc
  [ffffa3cd4c7efdb0] __handle_mm_fault+0xeee at ffffffff96272c3e
  [ffffa3cd4c7efe78] handle_mm_fault+0x17b at ffffffff9627369b
  [ffffa3cd4c7efeb0] __do_page_fault+0x34e at ffffffff9604f8ee
  [ffffa3cd4c7eff20] do_page_fault+0x57 at ffffffff9604fe27
  [ffffa3cd4c7eff38] page_fault+0x8 at ffffffff96a00e78
  [ffffa3cd4c7eff50] page_fault+0x1e at ffffffff96a00e8e
    RIP: 00005a46e611ac10  RSP: 00007fffa51f0f60  RFLAGS: 00010206
    RAX: 00000000098c0000  RBX: 000072d45bd40010  RCX: 000072d6aff243db
    RDX: 0000000000000000  RSI: 00000002540bf000  RDI: 000072d45bd40000
    RBP: 00005a46e611ba54   R8: 000072d45bd40010   R9: 0000000000000000
    R10: 0000000000000022  R11: 00000002540be400  R12: ffffffffffffffff
    R13: 0000000000000002  R14: 0000000000001000  R15: 00000002540be400
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash> bt -l 3848
PID: 3848   TASK: ffff934f83373d80  CPU: 11  COMMAND: "stress"
 #0 [ffffa3cd4c7ef7f0] __schedule at ffffffff969f471a
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/kernel/sched/core.c: 2818
 #1 [ffffa3cd4c7ef830] trace_hardirqs_on_thunk at ffffffff96001b02
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/arch/x86/entry/thunk_64.S: 42
 #2 [ffffa3cd4c7ef8a8] isolate_migratepages_block at ffffffff9626165b
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/mm/compaction.c: 1039
 #3 [ffffa3cd4c7ef978] compact_zone at ffffffff96263eb9
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/mm/compaction.c: 1817
 #4 [ffffa3cd4c7efa38] compact_zone_order at ffffffff962648b3
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/mm/compaction.c: 2313
 #5 [ffffa3cd4c7efaf0] try_to_compact_pages at ffffffff9626523e
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/mm/compaction.c: 2362
 #6 [ffffa3cd4c7efb60] __alloc_pages_direct_compact at ffffffff9629285c
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/mm/page_alloc.c: 3831
 #7 [ffffa3cd4c7efbb8] __alloc_pages_slowpath at ffffffff962930fc
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/mm/page_alloc.c: 4470
 #8 [ffffa3cd4c7efce8] __alloc_pages_nodemask at ffffffff962941ff
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/mm/page_alloc.c: 4678
 #9 [ffffa3cd4c7efd60] do_huge_pmd_anonymous_page at ffffffff962c6fcc
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/./include/linux/topology.h: 73
#10 [ffffa3cd4c7efdb0] __handle_mm_fault at ffffffff96272c3e
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/mm/memory.c: 3788
#11 [ffffa3cd4c7efe78] handle_mm_fault at ffffffff9627369b
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/mm/memory.c: 4058
#12 [ffffa3cd4c7efeb0] __do_page_fault at ffffffff9604f8ee
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/arch/x86/mm/fault.c: 1457
#13 [ffffa3cd4c7eff20] do_page_fault at ffffffff9604fe27
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/./arch/x86/include/asm/jump_label.h: 23
#14 [ffffa3cd4c7eff50] page_fault at ffffffff96a00e8e
    /home/user/build/1packages/4used/kernel/linux-stable/makepkg_pacman/linux-stable/src/linux-stable/arch/x86/entry/entry_64.S: 1156
    RIP: 00005a46e611ac10  RSP: 00007fffa51f0f60  RFLAGS: 00010206
    RAX: 00000000098c0000  RBX: 000072d45bd40010  RCX: 000072d6aff243db
    RDX: 0000000000000000  RSI: 00000002540bf000  RDI: 000072d45bd40000
    RBP: 00005a46e611ba54   R8: 000072d45bd40010   R9: 0000000000000000
    R10: 0000000000000022  R11: 00000002540be400  R12: ffffffffffffffff
    R13: 0000000000000002  R14: 0000000000001000  R15: 00000002540be400
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash> bt -Tsx 3902
PID: 3902   TASK: ffff934f5264bd80  CPU: 1   COMMAND: "stress"
  [ffffa3cd4c99efb8] get_page_from_freelist+0xaa9 at ffffffff96291799
  [ffffa3cd4c99f108] __alloc_pages_slowpath+0x216 at ffffffff96292cb6
  [ffffa3cd4c99f130] get_page_from_freelist+0xb76 at ffffffff96291866
  [ffffa3cd4c99f178] get_page_from_freelist+0xaa9 at ffffffff96291799
  [ffffa3cd4c99f2e0] ZSTD_compressSequences_internal+0x8db at ffffffff9660a27b
  [ffffa3cd4c99f480] test_clear_page_writeback+0x12a at ffffffff9623ad8a
  [ffffa3cd4c99f4a8] trace_hardirqs_on+0x2c at ffffffff961bc30c
  [ffffa3cd4c99f4c8] test_clear_page_writeback+0x18c at ffffffff9623adec
  [ffffa3cd4c99f538] decay_load+0x6a at ffffffff96111d1a
  [ffffa3cd4c99f5a0] __list_add_valid+0x90 at ffffffff965e3ff0
  [ffffa3cd4c99f650] check_preempt_wakeup+0x267 at ffffffff960f7097
  [ffffa3cd4c99f6b8] decay_load+0x6a at ffffffff96111d1a
  [ffffa3cd4c99f6e0] __accumulate_pelt_segments+0x29 at ffffffff96111d89
  [ffffa3cd4c99f700] __update_load_avg_se+0x1cb at ffffffff961120ab
  [ffffa3cd4c99f738] update_load_avg+0xca at ffffffff960f73aa
  [ffffa3cd4c99f740] update_load_avg+0xca at ffffffff960f73aa
  [ffffa3cd4c99f7e0] switch_mm_irqs_off+0x270 at ffffffff96057920
  [ffffa3cd4c99f820] __schedule+0x51a at ffffffff969f471a
  [ffffa3cd4c99f880] preempt_schedule_common+0x15 at ffffffff969f4ed5
  [ffffa3cd4c99f898] _cond_resched+0x3f at ffffffff969f4f3f
  [ffffa3cd4c99f8a8] isolate_migratepages_block+0xf8 at ffffffff96260dd8
  [ffffa3cd4c99f978] compact_zone+0x4f9 at ffffffff96263eb9
  [ffffa3cd4c99fa38] compact_zone_order+0xe3 at ffffffff962648b3
  [ffffa3cd4c99faf0] try_to_compact_pages+0xde at ffffffff9626523e
  [ffffa3cd4c99fb60] __alloc_pages_direct_compact+0x8c at ffffffff9629285c
  [ffffa3cd4c99fbb8] __alloc_pages_slowpath+0x65c at ffffffff962930fc
  [ffffa3cd4c99fce8] __alloc_pages_nodemask+0x4cf at ffffffff962941ff
  [ffffa3cd4c99fd60] do_huge_pmd_anonymous_page+0x17c at ffffffff962c6fcc
  [ffffa3cd4c99fdb0] __handle_mm_fault+0xeee at ffffffff96272c3e
  [ffffa3cd4c99fe78] handle_mm_fault+0x17b at ffffffff9627369b
  [ffffa3cd4c99feb0] __do_page_fault+0x34e at ffffffff9604f8ee
  [ffffa3cd4c99ff20] do_page_fault+0x57 at ffffffff9604fe27
  [ffffa3cd4c99ff38] page_fault+0x8 at ffffffff96a00e78
  [ffffa3cd4c99ff50] page_fault+0x1e at ffffffff96a00e8e
    RIP: 00005a46e611ac10  RSP: 00007fffa51f0f60  RFLAGS: 00010206
    RAX: 000000000d0c0000  RBX: 000072d45bd40010  RCX: 000072d6aff243db
    RDX: 0000000000000000  RSI: 00000002540bf000  RDI: 000072d45bd40000
    RBP: 00005a46e611ba54   R8: 000072d45bd40010   R9: 0000000000000000
    R10: 0000000000000022  R11: 00000002540be400  R12: ffffffffffffffff
    R13: 0000000000000002  R14: 0000000000001000  R15: 00000002540be400
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash> bt -l 3902
PID: 3902   TASK: ffff934f5264bd80  CPU: 9   COMMAND: "stress"
(active)
crash> 

```
As can be seen, it's doing stuff with(in) zstd, maybe that's why swap in zstd is needed to reproduce this (more easily? or at all?)

trace.log is still being compressed(via xz) but those running 'stress' processes using 100% cpu are only making this slower :)
I'll send it when it's ready?
Attachment:
publickey - howaboutsynergy@protonmail.com - 0x947B9B34.asc

Description: application/pgp-keys
Attachment:
signature.asc

Description: OpenPGP digital signature