Re: cephfs kernel 5.10.78 client crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2021-11-26 at 09:11 +0100, Andrej Filipcic wrote:
> Hi,
> 
> we are doing some extensive stress testing of cephfs client throughput. 
> Ceph is 16.2.6, and we have seen no issues on the ceph side. The client 
> specs:
> - kernel 5.10.78
> - RHEL8.4
> - 512GB memory, dual 32-core cpu
> - bonded dual 100Gb Mellanox ConnectX-5 card with OFED drivers
> - cephfs mount options: reltime, acl, nowsync
> - tcp tuning with 256MB max window, bbr congestion control
> 
> The client can handle 3.5GB/s sustained writes of several parallel 
> (10-20) large streams (1-10GB) to EC 16+3 pools, but after several 
> hours, kernel panics appeared (1st log), and one mds hung (2nd log). 
> That happend on 3 clients we were testing Such crashes appear only on 
> heavilly loaded clients, while on moderate load (~1GB/s) it almost never 
> happens. The stress test  calls fdatasync after each file write.
> 
> Any ideas what is wrong here? ceph kernel bug or some client 
> misconfiguration?
> 
> Best regards,
> Andrej
> 
> The panic:
> 
> 2021-11-25 22:53:33 [ 8335.433436] P2P Transfer - : page allocation 
> failure: order:4, mode:0x40c40(GFP_NOFS|__GFP_COMP), 
> nodemask=(null),cpuset=/,mems_allowed=0-1
> 2021-11-25 22:53:33 [ 8335.445969] CPU: 17 PID: 142288 Comm: P2P 
> Transfer -  Tainted: G           O      5.10.78-2.el8.x86_64 #1

Hand built kernel, I take it? You may want to try the latest mainline
kernels, but it probably won't make a big difference here.

> 2021-11-25 22:53:33 [ 8335.455532] Hardware name: BULL 
> R282-Z90-00/MZ92-FS0-00, BIOS R16 07/10/2020
> 2021-11-25 22:53:33 [ 8335.462577] Call Trace:
> 2021-11-25 22:53:33 [ 8335.465044]  dump_stack+0x6d/0x88
> 2021-11-25 22:53:33 [ 8335.468358]  warn_alloc.cold.125+0x7b/0xdd
> 2021-11-25 22:53:33 [ 8335.472449]  ? _cond_resched+0x15/0x30
> 2021-11-25 22:53:33 [ 8335.476203]  ? 
> __alloc_pages_direct_compact+0x12f/0x140
> 2021-11-25 22:53:33 [ 8335.481428] 
>   __alloc_pages_slowpath.constprop.115+0xbcd/0xc00
> 2021-11-25 22:53:33 [ 8335.487185]  ? send_request+0x833/0xb20 [libceph]
> 2021-11-25 22:53:33 [ 8335.491889]  __alloc_pages_nodemask+0x2cc/0x300
> 2021-11-25 22:53:33 [ 8335.496420]  kmalloc_order+0x24/0xf0
> 2021-11-25 22:53:33 [ 8335.500001]  kmalloc_order_trace+0x19/0x80
> 2021-11-25 22:53:33 [ 8335.504109]  ceph_writepages_start+0x80e/0x1400 
> [ceph]
> 2021-11-25 22:53:33 [ 8335.509251]  do_writepages+0x41/0xd0
> 2021-11-25 22:53:33 [ 8335.512827]  ? __ip_queue_xmit+0x15c/0x3e0
> 2021-11-25 22:53:33 [ 8335.516929]  __filemap_fdatawrite_range+0xc7/0x100
> 2021-11-25 22:53:33 [ 8335.521720]  file_write_and_wait_range+0x5e/0xb0
> 2021-11-25 22:53:33 [ 8335.526345]  ceph_fsync+0x4c/0x470 [ceph]
> 2021-11-25 22:53:33 [ 8335.530352]  do_fsync+0x38/0x70
> 2021-11-25 22:53:33 [ 8335.533497]  __x64_sys_fdatasync+0x13/0x20
> 2021-11-25 22:53:33 [ 8335.537599]  do_syscall_64+0x33/0x40
> 2021-11-25 22:53:33 [ 8335.541177] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:53:33 [ 8335.546229] RIP: 0033:0x7fcfb725e55f
> 2021-11-25 22:53:33 [ 8335.549810] Code: 00 0f 05 48 3d 00 f0 ff ff 77 
> 40 c3 0f 1f 80 00 00 00 00 53 89 fb 48 83 ec 10 e8 4c c4 f8 ff 89 df 89 
> c2 b8 4b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 
> 8e c4 f8 ff 8b 44 24


Ouch. Failed allocation while trying to write back memory to handle an
fsync.

It needed memory to write back the pages and the only place to get was
by writing back pages. This is sort of the writeback nightmare scenario
as it usually leads to a hard lockup on the box once nothing can get
memory, and you can't recover from it.

The usual method to deal with this is to prevent it from happening in
the first place by tuning writeback behavior. You may want to play with
the /proc/sys/vm/dirty_* tunables:

    https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html

In particular, you want to aim to start writeback of dirty data in the
background sooner, so (e.g.) lowering the dirty_background_ratio to 5
and dirty_ratio to 10 may help. Also consider setting the
dirty_writeback_centisecs to a lower value to make pages eligible for
writeout sooner.

Long term, it would be nice to get as much allocation out of the
writeback path as we can, but unfortunately that's not a simple thing to
change after the fact.

> 2021-11-25 22:53:33 [ 8335.568554] RSP: 002b:00007fce05ddb790 EFLAGS: 
> 00000293 ORIG_RAX: 000000000000004b
> 2021-11-25 22:53:33 [ 8335.576120] RAX: ffffffffffffffda RBX: 
> 0000000000000c4c RCX: 00007fcfb725e55f
> 2021-11-25 22:53:33 [ 8335.583251] RDX: 0000000000000000 RSI: 
> 00007fce05ddb7d0 RDI: 0000000000000c4c
> 2021-11-25 22:53:33 [ 8335.590375] RBP: 00007fce05ddb7c0 R08: 
> 0000000451655248 R09: 0000000451655160
> 2021-11-25 22:53:33 [ 8335.597499] R10: 0000000000001a7a R11: 
> 0000000000000293 R12: 0000000000000000
> 2021-11-25 22:53:33 [ 8335.604623] R13: 0000000800145668 R14: 
> 00007fce05ddb800 R15: 00007fcda0019000
> 2021-11-25 22:53:33 [ 8335.611758] Mem-Info:
> 2021-11-25 22:53:33 [ 8335.614064] active_anon:993 inactive_anon:9552068 
> isolated_anon:0
> 2021-11-25 22:53:33 [ 8335.614064]  active_file:1717317 
> inactive_file:114691936 isolated_file:0
> 2021-11-25 22:53:33 [ 8335.614064]  unevictable:0 dirty:10350536 
> writeback:545812
> 2021-11-25 22:53:33 [ 8335.614064]  slab_reclaimable:2036011 
> slab_unreclaimable:365944
> 2021-11-25 22:53:33 [ 8335.614064]  mapped:21846 shmem:2368 
> pagetables:26525 bounce:0
> 2021-11-25 22:53:33 [ 8335.614064]  free:1568016 free_pcp:5646 free_cma:0
> 2021-11-25 22:53:33 [ 8335.648764] Node 0 active_anon:3416kB 
> inactive_anon:24035756kB active_file:4089036kB inactive_file:226502860kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:79528kB 
> dirty:4849284kB writeback:25168kB shmem:8932kB shmem_thp: 0kB 
> shmem_pmdmapped: 0kB anon_thp: 20529152kB writeback_tmp:0kB 
> kernel_stack:40800kB all_unreclaimable? no
> 2021-11-25 22:53:33 [ 8335.679048] Node 1 active_anon:556kB 
> inactive_anon:14172516kB active_file:2780232kB inactive_file:232247192kB 
> unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:7856kB 
> dirty:36318296kB writeback:2365276kB shmem:540kB shmem_thp: 0kB 
> shmem_pmdmapped: 0kB anon_thp: 10524672kB writeback_tmp:0kB 
> kernel_stack:30048kB all_unreclaimable? no
> 2021-11-25 22:53:33 [ 8335.709344] Node 0 DMA free:11264kB min:0kB 
> low:12kB high:24kB reserved_highatomic:0KB active_anon:0kB 
> inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB 
> writepending:0kB present:15996kB managed:15360kB mlocked:0kB 
> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> 2021-11-25 22:53:33 [ 8335.735551] lowmem_reserve[]: 0 2104 257776 
> 257776 257776
> 2021-11-25 22:53:33 [ 8335.740968] Node 0 DMA32 free:1023336kB min:368kB 
> low:2488kB high:4608kB reserved_highatomic:0KB active_anon:0kB 
> inactive_anon:1104112kB active_file:456kB inactive_file:40kB 
> unevictable:0kB writepending:0kB present:2221440kB managed:2154840kB 
> mlocked:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB 
> free_cma:0kB
> 2021-11-25 22:53:33 [ 8335.769173] lowmem_reserve[]: 0 0 255672 255672 
> 255672
> 2021-11-25 22:53:33 [ 8335.774330] Node 0 Normal free:3252448kB 
> min:45564kB low:307364kB high:569164kB reserved_highatomic:2048KB 
> active_anon:3416kB inactive_anon:22931620kB active_file:4088580kB 
> inactive_file:226540064kB unevictable:0kB writepending:4872056kB 
> present:266061824kB managed:261808384kB mlocked:0kB pagetables:61080kB 
> bounce:0kB free_pcp:18444kB local_pcp:1376kB free_c
> ma:0kB
> 2021-11-25 22:53:33 [ 8335.806515] lowmem_reserve[]: 0 0 0 0 0
> 2021-11-25 22:53:33 [ 8335.810369] Node 1 Normal free:2297488kB 
> min:107424kB low:371624kB high:635824kB reserved_highatomic:2048KB 
> active_anon:556kB inactive_anon:14172532kB active_file:2780284kB 
> inactive_file:231750564kB unevictable:0kB writepending:38331120kB 
> present:268433408kB managed:264200892kB mlocked:0kB pagetables:45020kB 
> bounce:0kB free_pcp:15540kB local_pcp:0kB free_cma
> :0kB
> 2021-11-25 22:53:33 [ 8335.842381] lowmem_reserve[]: 0 0 0 0 0
> 2021-11-25 22:53:33 [ 8335.846225] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 
> 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 
> 11264kB
> 2021-11-25 22:53:33 [ 8335.857874] Node 0 DMA32: 54*4kB (UM) 58*8kB 
> (UME) 60*16kB (UME) 121*32kB (UME) 107*64kB (UME) 79*128kB (UME) 
> 54*256kB (ME) 56*512kB (UME) 50*1024kB (ME) 69*2048kB (UM) 187*4096kB 
> (UM) = 1023432kB
> 2021-11-25 22:53:33 [ 8335.875329] Node 0 Normal: 551149*4kB (UME) 
> 104631*8kB (UME) 11220*16kB (UME) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 
> 0*1024kB 1*2048kB (H) 0*4096kB = 3223212kB
> 2021-11-25 22:53:33 [ 8335.889493] Node 1 Normal: 11*4kB (M) 53961*8kB 
> (UME) 109253*16kB (UME) 1*32kB (H) 1*64kB (H) 1*128kB (H) 1*256kB (H) 
> 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 2181796kB

So here it shows that there is 
> 2021-11-25 22:53:34 [ 8335.904877] Node 0 hugepages_total=0 
> hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> 2021-11-25 22:53:34 [ 8335.913577] Node 0 hugepages_total=0 
> hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> 2021-11-25 22:53:34 [ 8335.922013] Node 1 hugepages_total=0 
> hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> 2021-11-25 22:53:34 [ 8335.930711] Node 1 hugepages_total=0 
> hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> 2021-11-25 22:53:34 [ 8335.939140] 116162432 total pagecache pages
> 2021-11-25 22:53:34 [ 8335.943330] 36 pages in swap cache
> 2021-11-25 22:53:34 [ 8335.946445] general protection fault, probably 
> for non-canonical address 0x473ea8b1095ffa30: 0000 [#1] SMP NOPTI
> 2021-11-25 22:53:34 [ 8335.946737] Swap cache stats: add 533, delete 
> 497, find 46/164
> 2021-11-25 22:53:34 [ 8335.956894] CPU: 102 PID: 94937 Comm: 
> kworker/102:5 Tainted: G           O      5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:53:34 [ 8335.956896] Hardware name: BULL 
> R282-Z90-00/MZ92-FS0-00, BIOS R16 07/10/2020
> 2021-11-25 22:53:34 [ 8335.956911] Workqueue: ceph-msgr ceph_con_workfn 
> [libceph]
> 2021-11-25 22:53:34 [ 8335.962750] Free swap  = 4182012kB
> 2021-11-25 22:53:34 [ 8335.972190] RIP: 
> 0010:writepages_finish+0x224/0x450 [ceph]
> 2021-11-25 22:53:34 [ 8335.972195] Code: 54 24 0c 85 d2 74 68 4c 89 ff 
> e8 e7 b2 56 f5 48 3b 1c 24 74 75 49 8b 45 08 4c 8b 3c 18 48 83 c3 08 4d 
> 85 ff 0f 84 f8 00 00 00 <49> 8b 57 08 48 8d 42 ff 83 e2 01 49 0f 44 c7 
> 48 8b 00 a8 04 0f 84
> 2021-11-25 22:53:34 [ 8335.979244] Total swap = 4189180kB
> 2021-11-25 22:53:34 [ 8335.984715] RSP: 0018:ffffb88718217cc0 EFLAGS: 
> 00010206
> 2021-11-25 22:53:34 [ 8335.984717] RAX: ffff988ee8e18000 RBX: 
> 0000000000008008 RCX: 0000000000031489
> 2021-11-25 22:53:34 [ 8335.984719] RDX: ffffe09cb0ce9b87 RSI: 
> 0000000000000000 RDI: ffffe09cb0ce9bc0
> 2021-11-25 22:53:34 [ 8335.984720] RBP: ffff988e061f2710 R08: 
> 0000000000031468 R09: 0000000000000020
> 2021-11-25 22:53:34 [ 8335.984720] R10: 00000000000000d7 R11: 
> 0000000000000001 R12: ffff98bc2d211430
> 2021-11-25 22:53:34 [ 8335.984721] R13: ffff988ea9769ad8 R14: 
> ffff988e061f26c0 R15: 473ea8b1095ffa30
> 2021-11-25 22:53:34 [ 8335.984722] FS:  0000000000000000(0000) 
> GS:ffff98ccee580000(0000) knlGS:0000000000000000
> 2021-11-25 22:53:34 [ 8335.984723] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> 2021-11-25 22:53:34 [ 8335.984727] CR2: 00007fcf5006aac8 CR3: 
> 000000016ad60000 CR4: 0000000000350ee0
> 2021-11-25 22:53:34 [ 8335.988134] 134183167 pages RAM
> 2021-11-25 22:53:34 [ 8335.993607] Call Trace:
> 2021-11-25 22:53:34 [ 8335.993620]  __complete_request+0x22/0x70 [libceph]
> 2021-11-25 22:53:34 [ 8335.993630]  dispatch+0x15e/0xb40 [libceph]
> 2021-11-25 22:53:34 [ 8336.012367] 0 pages HighMem/MovableOnly
> 2021-11-25 22:53:34 [ 8336.015766]  ? 
> ceph_x_check_message_signature+0x54/0xc0 [libceph]
> 2021-11-25 22:53:34 [ 8336.015774]  ? read_partial_message+0x214/0x770 
> [libceph]
> 2021-11-25 22:53:34 [ 8336.020998] 2138298 pages reserved
> 2021-11-25 22:53:34 [ 8336.028131]  try_read+0x77a/0x1190 [libceph]
> 2021-11-25 22:53:34 [ 8336.028139]  ceph_con_workfn+0x10f/0x610 [libceph]
> 2021-11-25 22:53:34 [ 8336.028145]  ? __schedule+0x299/0x7a0
> 2021-11-25 22:53:34 [ 8336.035276] 0 pages hwpoisoned
> 2021-11-25 22:53:34 [ 8336.042401]  process_one_work+0x1fb/0x390
> 2021-11-25 22:53:34 [ 8336.042403]  worker_thread+0x2d/0x3e0
> 2021-11-25 22:53:34 [ 8336.042405]  ? process_one_work+0x390/0x390
> 2021-11-25 22:53:34 [ 8336.042406]  kthread+0x116/0x130
> 2021-11-25 22:53:34 [ 8336.042408]  ? kthread_park+0x80/0x80
> 2021-11-25 22:53:34 [ 8336.042414]  ret_from_fork+0x22/0x30
> 2021-11-25 22:53:34 [ 8336.149055] Modules linked in: binfmt_misc ceph 
> xfs rbd libceph dns_resolver 8021q garp mrp stp llc bonding 
> nft_reject_inet nf_reject_ipv4 sch_fq nf_reject_ipv6 nft_reject rfkill 
> nft_limit rdma_ucm(O) rdma_cm(O) iw_cm(O) nft_ct nf_conntrack 
> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables ib_ipoib(O) libcrc32c ib_cm(O) 
> nfnetlink ib_umad(O) sunrpc vfat fat ipmi_ssif a
> md64_edac_mod edac_mce_amd amd_energy kvm_amd kvm irqbypass 
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl pcspkr ast 
> drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm syscopyarea 
> sysfillrect sp5100_tco sysimgblt acpi_ipmi fb_sys_fops ccp i2c_piix4 
> k10temp ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq tcp_bbr 
> ip_tables ext4 mbcache jbd2 mlx5_ib(O) ib_uverbs(O) ib_c
> ore(O) raid1 sd_mod t10_pi sg crc32c_intel mlx5_core(O) mlxfw(O) 
> pci_hyperv_intf tls ahci igb libahci psample mlxdevm(O) i2c_algo_bit 
> auxiliary(O) libata dca mlx_compat(O) pinctrl_amd
> 2021-11-25 22:53:34 [ 8336.229967] general protection fault, probably 
> for non-canonical address 0x790fa7461edf3e0a: 0000 [#2] SMP NOPTI
> 2021-11-25 22:53:34 [ 8336.229999] ---[ end trace aa094bc887e83e0e ]---
> 2021-11-25 22:53:34 [ 8336.240134] CPU: 60 PID: 110782 Comm: 
> kworker/60:5 Tainted: G      D    O      5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:53:34 [ 8336.240136] Hardware name: BULL 
> R282-Z90-00/MZ92-FS0-00, BIOS R16 07/10/2020
> 2021-11-25 22:53:34 [ 8336.240200] Workqueue: ceph-msgr ceph_con_workfn 
> [libceph]
> 2021-11-25 22:53:34 [ 8336.240207] RIP: 0010:ceph_tcp_sendpage+0x5/0x80 
> [libceph]
> 2021-11-25 22:53:35 [ 8336.240209] Code: 00 00 00 e8 ed 89 8a f5 eb e2 
> 0f 0b 0f 0b e8 e2 f7 ff ff be 02 00 00 00 e8 d8 89 8a f5 eb cd 66 0f 1f 
> 44 00 00 0f 1f 44 00 00 <4c> 8b 4e 08 41 81 c8 40 40 00 00 49 8d 41 ff 
> 41 83 e1 01 48 0f 44
> 2021-11-25 22:53:35 [ 8336.240210] RSP: 0018:ffffb886d94b7db0 EFLAGS: 
> 00010206
> 2021-11-25 22:53:35 [ 8336.240211] RAX: 0000000000008000 RBX: 
> ffff988521b89268 RCX: 0000000000000d2c
> 2021-11-25 22:53:35 [ 8336.240212] RDX: 00000000000002d4 RSI: 
> 790fa7461edf3e0a RDI: ffff984eb581d7c0
> 2021-11-25 22:53:35 [ 8336.240212] RBP: 0000000000028000 R08: 
> 0000000000028000 R09: ffff9850980fdd0c
> 2021-11-25 22:53:35 [ 8336.240213] R10: 0000000000000000 R11: 
> 0000000000000000 R12: ffff988521b892e8
> 2021-11-25 22:53:35 [ 8336.240213] R13: 790fa7461edf3e0a R14: 
> ffff988faa62b830 R15: 0000000000000000
> 2021-11-25 22:53:35 [ 8336.240214] FS:  0000000000000000(0000) 
> GS:ffff98ccee300000(0000) knlGS:0000000000000000
> 2021-11-25 22:53:35 [ 8336.240215] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> 2021-11-25 22:53:35 [ 8336.240215] CR2: 00007efb141b6000 CR3: 
> 000000016ad60000 CR4: 0000000000350ee0
> 2021-11-25 22:53:35 [ 8336.240216] Call Trace:
> 2021-11-25 22:53:35 [ 8336.240225]  try_write+0x129/0xb90 [libceph]
> 2021-11-25 22:53:35 [ 8336.240232]  ? try_read+0x36c/0x1190 [libceph]
> 2021-11-25 22:53:35 [ 8336.240237]  ? set_next_entity+0xa6/0x1d0
> 2021-11-25 22:53:35 [ 8336.240243]  ceph_con_workfn+0x32d/0x610 [libceph]
> 2021-11-25 22:53:35 [ 8336.240247]  ? __schedule+0x299/0x7a0
> 2021-11-25 22:53:35 [ 8336.240249]  process_one_work+0x1fb/0x390
> 2021-11-25 22:53:35 [ 8336.240253]  worker_thread+0x2d/0x3e0
> 2021-11-25 22:53:35 [ 8336.325025] RIP: 
> 0010:writepages_finish+0x224/0x450 [ceph]
> 2021-11-25 22:53:35 [ 8336.329880]  ? process_one_work+0x390/0x390
> 2021-11-25 22:53:35 [ 8336.329882]  kthread+0x116/0x130
> 2021-11-25 22:53:35 [ 8336.329884]  ? kthread_park+0x80/0x80
> 2021-11-25 22:53:35 [ 8336.329888]  ret_from_fork+0x22/0x30
> 2021-11-25 22:53:35 [ 8336.329890] Modules linked in: binfmt_misc ceph 
> xfs rbd libceph dns_resolver 8021q garp mrp stp llc bonding 
> nft_reject_inet nf_reject_ipv4 sch_fq nf_reject_ipv6 nft_reject rfkill 
> nft_limit rdma_ucm(O) rdma_cm(O) iw_cm(O) nft_ct nf_conntrack 
> nf_defrag_ipv6 nf_defrag_ipv4 nf_tables ib_ipoib(O) libcrc32c ib_cm(O) 
> nfnetlink ib_umad(O) sunrpc vfat fat ipmi_ssif a
> md64_edac_mod edac_mce_amd amd_energy kvm_amd kvm irqbypass 
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl pcspkr ast 
> drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm syscopyarea 
> sysfillrect sp5100_tco sysimgblt acpi_ipmi fb_sys_fops ccp i2c_piix4 
> k10temp ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq tcp_bbr 
> ip_tables ext4 mbcache jbd2 mlx5_ib(O) ib_uverbs(O) ib_c
> ore(O) raid1 sd_mod t10_pi sg crc32c_intel mlx5_core(O) mlxfw(O) 
> pci_hyperv_intf
> 2021-11-25 22:53:35 [ 8336.337031] Code: 54 24 0c 85 d2 74 68 4c 89 ff 
> e8 e7 b2 56 f5 48 3b 1c 24 74 75 49 8b 45 08 4c 8b 3c 18 48 83 c3 08 4d 
> 85 ff 0f 84 f8 00 00 00 <49> 8b 57 08 48 8d 42 ff 83 e2 01 49 0f 44 c7 
> 48 8b 00 a8 04 0f 84
> 2021-11-25 22:53:35 [ 8336.342469]  tls ahci igb libahci psample 
> mlxdevm(O) i2c_algo_bit auxiliary(O) libata dca mlx_compat(O) pinctrl_amd
> 2021-11-25 22:53:35 [ 8336.342647] ---[ end trace aa094bc887e83e0f ]---
> 2021-11-25 22:53:35 [ 8336.348078] RSP: 0018:ffffb88718217cc0 EFLAGS: 
> 00010206
> 2021-11-25 22:53:35 [ 8336.348080] RAX: ffff988ee8e18000 RBX: 
> 0000000000008008 RCX: 0000000000031489
> 2021-11-25 22:53:35 [ 8336.348081] RDX: ffffe09cb0ce9b87 RSI: 
> 0000000000000000 RDI: ffffe09cb0ce9bc0
> 2021-11-25 22:53:35 [ 8336.348082] RBP: ffff988e061f2710 R08: 
> 0000000000031468 R09: 0000000000000020
> 2021-11-25 22:53:35 [ 8336.348083] R10: 00000000000000d7 R11: 
> 0000000000000001 R12: ffff98bc2d211430
> 2021-11-25 22:53:35 [ 8336.348083] R13: ffff988ea9769ad8 R14: 
> ffff988e061f26c0 R15: 473ea8b1095ffa30
> 2021-11-25 22:53:35 [ 8336.348084] FS:  0000000000000000(0000) 
> GS:ffff98ccee580000(0000) knlGS:0000000000000000
> 2021-11-25 22:53:35 [ 8336.348085] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> 2021-11-25 22:53:35 [ 8336.348086] CR2: 00007fcf5006aac8 CR3: 
> 000000016ad60000 CR4: 0000000000350ee0
> 2021-11-25 22:53:35 [ 8336.348087] Kernel panic - not syncing: Fatal 
> exception
> 2021-11-25 22:53:35 [ 8336.349344] Kernel Offset: 0x35200000 from 
> 0xffffffff81000000 (relocation range: 
> 0xffffffff80000000-0xffffffffbfffffff)
> 2021-11-25 22:53:35 [ 8337.778774] ---[ end Kernel panic - not syncing: 
> Fatal exception ]---
> 
> 
> The hung mds:
> 
> 2021-11-25 21:20:43 [  205.144673] libceph: client80562712 fsid 
> 31450363-7461-457f-be77-68dc1740f718
> 2021-11-25 22:12:39 [ 3321.421663] INFO: task node_exporter:4007 blocked 
> for more than 122 seconds.
> 2021-11-25 22:12:39 [ 3321.428713]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:39 [ 3321.435152] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:39 [ 3321.442976] task:node_exporter   state:D stack: 
>     0 pid: 4007 ppid:     1 flags:0x00004080
> 2021-11-25 22:12:39 [ 3321.451326] Call Trace:
> 2021-11-25 22:12:39 [ 3321.453784]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:39 [ 3321.457277]  schedule+0x3c/0xa0
> 2021-11-25 22:12:39 [ 3321.460422]  rwsem_down_read_slowpath+0x2f6/0x4a0
> 2021-11-25 22:12:39 [ 3321.465196]  ceph_quota_update_statfs+0x28/0x140 
> [ceph]
> 2021-11-25 22:12:39 [ 3321.470434]  ceph_statfs+0x13f/0x160 [ceph]
> 2021-11-25 22:12:39 [ 3321.474625]  statfs_by_dentry+0x67/0x90
> 2021-11-25 22:12:39 [ 3321.478468]  vfs_statfs+0x16/0xd0
> 2021-11-25 22:12:39 [ 3321.481786]  user_statfs+0x54/0xa0
> 2021-11-25 22:12:39 [ 3321.485198]  __do_sys_statfs+0x20/0x50
> 2021-11-25 22:12:39 [ 3321.488955]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:39 [ 3321.492695] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:39 [ 3321.497749] RIP: 0033:0x4b25fb
> 2021-11-25 22:12:39 [ 3321.500811] RSP: 002b:000000c0014df418 EFLAGS: 
> 00000212 ORIG_RAX: 0000000000000089
> 2021-11-25 22:12:39 [ 3321.508380] RAX: ffffffffffffffda RBX: 
> 000000c000043000 RCX: 00000000004b25fb
> 2021-11-25 22:12:39 [ 3321.515517] RDX: 0000000000000000 RSI: 
> 000000c0014df570 RDI: 000000c000cae350
> 2021-11-25 22:12:39 [ 3321.522648] RBP: 000000c0014df478 R08: 
> 0000000000ab8601 R09: 0000000000000001
> 2021-11-25 22:12:39 [ 3321.529781] R10: 000000c000cae350 R11: 
> 0000000000000212 R12: 0000000000000036
> 2021-11-25 22:12:39 [ 3321.536915] R13: 0000000000000035 R14: 
> 0000000000000200 R15: 0000000000000055
> 2021-11-25 22:12:39 [ 3321.544105] INFO: task System-1:5514 blocked for 
> more than 123 seconds.
> 2021-11-25 22:12:39 [ 3321.550723]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:39 [ 3321.557160] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:39 [ 3321.564986] task:System-1        state:D stack: 
>     0 pid: 5514 ppid:  4294 flags:0x00004080
> 2021-11-25 22:12:39 [ 3321.573334] Call Trace:
> 2021-11-25 22:12:39 [ 3321.575784]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:39 [ 3321.579276]  schedule+0x3c/0xa0
> 2021-11-25 22:12:39 [ 3321.582425]  rwsem_down_read_slowpath+0x2f6/0x4a0
> 2021-11-25 22:12:39 [ 3321.587135]  ceph_quota_update_statfs+0x28/0x140 
> [ceph]
> 2021-11-25 22:12:39 [ 3321.592369]  ceph_statfs+0x13f/0x160 [ceph]
> 2021-11-25 22:12:39 [ 3321.596735]  statfs_by_dentry+0x67/0x90
> 2021-11-25 22:12:39 [ 3321.600573]  vfs_statfs+0x16/0xd0
> 2021-11-25 22:12:39 [ 3321.603892]  user_statfs+0x54/0xa0
> 2021-11-25 22:12:39 [ 3321.607299]  __do_sys_statfs+0x20/0x50
> 2021-11-25 22:12:39 [ 3321.611051]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:39 [ 3321.614628] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:39 [ 3321.619683] RIP: 0033:0x7f8e8026bfdb
> 2021-11-25 22:12:39 [ 3321.623260] RSP: 002b:00007f8c9b1f36d8 EFLAGS: 
> 00000246 ORIG_RAX: 0000000000000089
> 2021-11-25 22:12:39 [ 3321.630827] RAX: ffffffffffffffda RBX: 
> 00007f8c680100f0 RCX: 00007f8e8026bfdb
> 2021-11-25 22:12:39 [ 3321.637964] RDX: 00007f8c680100f0 RSI: 
> 00007f8c9b1f36e0 RDI: 00007f8c680100f0
> 2021-11-25 22:12:39 [ 3321.645100] RBP: 00007f8c9b1f36e0 R08: 
> 0000000000000000 R09: 000000008b184681
> 2021-11-25 22:12:39 [ 3321.652233] R10: 00007f8e693f4ca5 R11: 
> 0000000000000246 R12: 00007f8c9b1f3780
> 2021-11-25 22:12:39 [ 3321.659368] R13: 00007f8c680100f0 R14: 
> 00007f8c9b1f3828 R15: 00007f8c6400b800
> 2021-11-25 22:12:39 [ 3321.666518] INFO: task healthcheck-sch:7738 
> blocked for more than 123 seconds.
> 2021-11-25 22:12:39 [ 3321.673738]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:39 [ 3321.680173] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:39 [ 3321.687999] task:healthcheck-sch state:D stack: 
>     0 pid: 7738 ppid:  4294 flags:0x00000080
> 2021-11-25 22:12:39 [ 3321.696346] Call Trace:
> 2021-11-25 22:12:39 [ 3321.698802]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:39 [ 3321.702295]  schedule+0x3c/0xa0
> 2021-11-25 22:12:39 [ 3321.705438]  rwsem_down_write_slowpath+0x2f0/0x4a0
> 2021-11-25 22:12:39 [ 3321.710235]  ? handle_mm_fault+0xb1f/0xb70
> 2021-11-25 22:12:39 [ 3321.714332]  do_unlinkat+0x140/0x300
> 2021-11-25 22:12:39 [ 3321.717912]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:39 [ 3321.721489] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:39 [ 3321.726539] RIP: 0033:0x7f8e8026e16b
> 2021-11-25 22:12:39 [ 3321.730119] RSP: 002b:00007f8c98c6b0f8 EFLAGS: 
> 00000206 ORIG_RAX: 0000000000000057
> 2021-11-25 22:12:39 [ 3321.737687] RAX: ffffffffffffffda RBX: 
> 00007f8c70dca348 RCX: 00007f8e8026e16b
> 2021-11-25 22:12:39 [ 3321.744818] RDX: 00007f8c28002ec0 RSI: 
> 00007f8c98c6b188 RDI: 00007f8c28002ec0
> 2021-11-25 22:12:39 [ 3321.751951] RBP: 00007f8c98c6b110 R08: 
> 0000000081e598f9 R09: 000000008b005a8c
> 2021-11-25 22:12:39 [ 3321.759096] R10: 00007f8e608a16e1 R11: 
> 0000000000000206 R12: 0000000000000000
> 2021-11-25 22:12:39 [ 3321.766229] R13: 00007f8e21d7fa08 R14: 
> 00007f8c98c6b1a0 R15: 00007f8c70dca000
> 2021-11-25 22:12:39 [ 3321.773369] INFO: task P2P Transfer - :53802 
> blocked for more than 123 seconds.
> 2021-11-25 22:12:39 [ 3321.780679]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:39 [ 3321.787121] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:39 [ 3321.794948] task:P2P Transfer -  state:D stack: 
>     0 pid:53802 ppid:  4294 flags:0x00000082
> 2021-11-25 22:12:39 [ 3321.803294] Call Trace:
> 2021-11-25 22:12:39 [ 3321.805747]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:39 [ 3321.809241]  schedule+0x3c/0xa0
> 2021-11-25 22:12:39 [ 3321.812383]  rwsem_down_read_slowpath+0x2f6/0x4a0
> 2021-11-25 22:12:39 [ 3321.817098]  check_quota_exceeded+0x64/0x220 [ceph]
> 2021-11-25 22:12:39 [ 3321.821986]  ceph_write_iter+0x1bf/0xc90 [ceph]
> 2021-11-25 22:12:39 [ 3321.826523]  ? tcp_recvmsg+0x63e/0xb80
> 2021-11-25 22:12:39 [ 3321.830279]  ? inet6_recvmsg+0x5e/0x110
> 2021-11-25 22:12:39 [ 3321.834123]  ? sock_recvmsg+0x1c/0x70
> 2021-11-25 22:12:39 [ 3321.837789]  ? new_sync_write+0x11f/0x1b0
> 2021-11-25 22:12:39 [ 3321.841801]  new_sync_write+0x11f/0x1b0
> 2021-11-25 22:12:39 [ 3321.845644]  vfs_write+0x1bd/0x270
> 2021-11-25 22:12:39 [ 3321.849046]  ksys_write+0x59/0xd0
> 2021-11-25 22:12:39 [ 3321.852365]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:39 [ 3321.855945]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:39 [ 3321.860997] RIP: 0033:0x7f8e8096a847
> 2021-11-25 22:12:39 [ 3321.864578] RSP: 002b:00007f8a05c9c5c0 EFLAGS: 
> 00000293 ORIG_RAX: 0000000000000001
> 2021-11-25 22:12:39 [ 3321.872145] RAX: ffffffffffffffda RBX: 
> 0000000000000c4e RCX: 00007f8e8096a847
> 2021-11-25 22:12:39 [ 3321.879275] RDX: 0000000000002000 RSI: 
> 00007f8904001bf0 RDI: 0000000000000c4e
> 2021-11-25 22:12:39 [ 3321.886410] RBP: 00007f8904001bf0 R08: 
> 0000000000000000 R09: 000000043389c7e0
> 2021-11-25 22:12:39 [ 3321.893542] R10: 00000000000012b8 R11: 
> 0000000000000293 R12: 0000000000002000
> 2021-11-25 22:12:39 [ 3321.900674] R13: 00007f8904001bf0 R14: 
> 00007f8a05c9c650 R15: 00007f88e4015800
> 2021-11-25 22:12:39 [ 3321.907809] INFO: task vega_izum_si_03:57727 
> blocked for more than 123 seconds.
> 2021-11-25 22:12:39 [ 3321.915116]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:39 [ 3321.921553] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:39 [ 3321.929376] task:vega_izum_si_03 state:D stack: 
>     0 pid:57727 ppid:  4294 flags:0x00000080
> 2021-11-25 22:12:39 [ 3321.937725] Call Trace:
> 2021-11-25 22:12:39 [ 3321.940176]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:39 [ 3321.943675]  ? __touch_cap+0x1f/0xd0 [ceph]
> 2021-11-25 22:12:39 [ 3321.947865]  schedule+0x3c/0xa0
> 2021-11-25 22:12:39 [ 3321.951009]  rwsem_down_read_slowpath+0x2f6/0x4a0
> 2021-11-25 22:12:39 [ 3321.955716]  ? lookup_fast+0xae/0x150
> 2021-11-25 22:12:39 [ 3321.959390]  walk_component+0x129/0x1b0
> 2021-11-25 22:12:39 [ 3321.963231]  ? path_init+0x2ef/0x360
> 2021-11-25 22:12:39 [ 3321.966808]  path_lookupat.isra.42+0x67/0x140
> 2021-11-25 22:12:39 [ 3321.971171]  ? futex_wait+0x19a/0x230
> 2021-11-25 22:12:39 [ 3321.974834]  filename_lookup.part.56+0xa0/0x170
> 2021-11-25 22:12:39 [ 3321.979371]  ? __check_object_size+0x162/0x180
> 2021-11-25 22:12:39 [ 3321.983816]  ? strncpy_from_user+0x46/0x1e0
> 2021-11-25 22:12:39 [ 3321.988000]  vfs_statx+0x72/0x110
> 2021-11-25 22:12:39 [ 3321.991320]  __do_sys_newstat+0x39/0x70
> 2021-11-25 22:12:39 [ 3321.995160]  ? 
> syscall_trace_enter.isra.19+0x123/0x190
> 2021-11-25 22:12:39 [ 3322.000308]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:39 [ 3322.003887] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:39 [ 3322.008940] RIP: 0033:0x7f8e8026ba79
> 2021-11-25 22:12:39 [ 3322.012519] RSP: 002b:00007f89268e4048 EFLAGS: 
> 00000246 ORIG_RAX: 0000000000000004
> 2021-11-25 22:12:39 [ 3322.020084] RAX: ffffffffffffffda RBX: 
> 00007f89268e4050 RCX: 00007f8e8026ba79
> 2021-11-25 22:12:39 [ 3322.027222] RDX: 00007f89268e4050 RSI: 
> 00007f89268e4050 RDI: 00007f88d80011e0
> 2021-11-25 22:12:39 [ 3322.034349] RBP: 00007f89268e4100 R08: 
> 0000000000000000 R09: 000000008bad28f1
> 2021-11-25 22:12:39 [ 3322.041483] R10: 00007f8e687103a5 R11: 
> 0000000000000246 R12: 00007f88d80011e0
> 2021-11-25 22:12:39 [ 3322.048614] R13: 00007f8c5409bb48 R14: 
> 00007f89268e4118 R15: 00007f8c5409b800
> 2021-11-25 22:12:39 [ 3322.055760] INFO: task P2P Transfer - :58084 
> blocked for more than 123 seconds.
> 2021-11-25 22:12:39 [ 3322.063072]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:39 [ 3322.069511] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:39 [ 3322.077335] task:P2P Transfer -  state:D stack: 
>     0 pid:58084 ppid:  4294 flags:0x00000082
> 2021-11-25 22:12:40 [ 3322.085681] Call Trace:
> 2021-11-25 22:12:40 [ 3322.088135]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:40 [ 3322.091628]  schedule+0x3c/0xa0
> 2021-11-25 22:12:40 [ 3322.094773]  rwsem_down_read_slowpath+0x2f6/0x4a0
> 2021-11-25 22:12:40 [ 3322.099487]  check_quota_exceeded+0x64/0x220 [ceph]
> 2021-11-25 22:12:40 [ 3322.104372]  ceph_write_iter+0x1bf/0xc90 [ceph]
> 2021-11-25 22:12:40 [ 3322.108911]  ? tcp_recvmsg+0x63e/0xb80
> 2021-11-25 22:12:40 [ 3322.112661]  ? inet6_recvmsg+0x5e/0x110
> 2021-11-25 22:12:40 [ 3322.116501]  ? sock_recvmsg+0x1c/0x70
> 2021-11-25 22:12:40 [ 3322.120168]  ? new_sync_write+0x11f/0x1b0
> 2021-11-25 22:12:40 [ 3322.124181]  new_sync_write+0x11f/0x1b0
> 2021-11-25 22:12:40 [ 3322.128019]  vfs_write+0x1bd/0x270
> 2021-11-25 22:12:40 [ 3322.131427]  ksys_write+0x59/0xd0
> 2021-11-25 22:12:40 [ 3322.134743]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:40 [ 3322.138325] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:40 [ 3322.143377] RIP: 0033:0x7f8e8096a847
> 2021-11-25 22:12:40 [ 3322.146954] RSP: 002b:00007f8a072b05c0 EFLAGS: 
> 00000293 ORIG_RAX: 0000000000000001
> 2021-11-25 22:12:40 [ 3322.154523] RAX: ffffffffffffffda RBX: 
> 0000000000000c48 RCX: 00007f8e8096a847
> 2021-11-25 22:12:40 [ 3322.161657] RDX: 0000000000002000 RSI: 
> 00007f89a48e91f0 RDI: 0000000000000c48
> 2021-11-25 22:12:40 [ 3322.168787] RBP: 00007f89a48e91f0 R08: 
> 0000000000000000 R09: 000000043389c888
> 2021-11-25 22:12:40 [ 3322.175920] R10: 00000000000012b8 R11: 
> 0000000000000293 R12: 0000000000002000
> 2021-11-25 22:12:40 [ 3322.183052] R13: 00007f89a48e91f0 R14: 
> 00007f8a072b0650 R15: 00007f88d8003000
> 2021-11-25 22:12:40 [ 3322.190187] INFO: task P2P Transfer - :58302 
> blocked for more than 123 seconds.
> 2021-11-25 22:12:40 [ 3322.197495]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:40 [ 3322.203931] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:40 [ 3322.211757] task:P2P Transfer -  state:D stack: 
>     0 pid:58302 ppid:  4294 flags:0x00000080
> 2021-11-25 22:12:40 [ 3322.220105] Call Trace:
> 2021-11-25 22:12:40 [ 3322.222558]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:40 [ 3322.226057]  schedule+0x3c/0xa0
> 2021-11-25 22:12:40 [ 3322.229203]  rwsem_down_read_slowpath+0x2f6/0x4a0
> 2021-11-25 22:12:40 [ 3322.233918]  check_quota_exceeded+0x64/0x220 [ceph]
> 2021-11-25 22:12:40 [ 3322.238802]  ceph_write_iter+0x1bf/0xc90 [ceph]
> 2021-11-25 22:12:40 [ 3322.243341]  ? tcp_recvmsg+0x63e/0xb80
> 2021-11-25 22:12:40 [ 3322.247091]  ? inet6_recvmsg+0x5e/0x110
> 2021-11-25 22:12:40 [ 3322.250932]  ? new_sync_write+0x11f/0x1b0
> 2021-11-25 22:12:40 [ 3322.254944]  new_sync_write+0x11f/0x1b0
> 2021-11-25 22:12:40 [ 3322.258785]  vfs_write+0x1bd/0x270
> 2021-11-25 22:12:40 [ 3322.262188]  ksys_write+0x59/0xd0
> 2021-11-25 22:12:40 [ 3322.265510]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:40 [ 3322.269087] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:40 [ 3322.274154] RIP: 0033:0x7f8e8096a847
> 2021-11-25 22:12:40 [ 3322.277728] RSP: 002b:00007f8925bd75c0 EFLAGS: 
> 00000293 ORIG_RAX: 0000000000000001
> 2021-11-25 22:12:40 [ 3322.285296] RAX: ffffffffffffffda RBX: 
> 0000000000000c44 RCX: 00007f8e8096a847
> 2021-11-25 22:12:40 [ 3322.292427] RDX: 0000000000002000 RSI: 
> 00007f88ac001af0 RDI: 0000000000000c44
> 2021-11-25 22:12:40 [ 3322.299561] RBP: 00007f88ac001af0 R08: 
> 0000000000000000 R09: 000000043385aac8
> 2021-11-25 22:12:40 [ 3322.306691] R10: 00000000000012b8 R11: 
> 0000000000000293 R12: 0000000000002000
> 2021-11-25 22:12:40 [ 3322.313823] R13: 00007f88ac001af0 R14: 
> 00007f8925bd7650 R15: 00007f88d8007000
> 2021-11-25 22:12:40 [ 3322.320958] INFO: task P2P Transfer - :58781 
> blocked for more than 123 seconds.
> 2021-11-25 22:12:40 [ 3322.328266]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:40 [ 3322.334704] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:40 [ 3322.342528] task:P2P Transfer -  state:D stack: 
>     0 pid:58781 ppid:  4294 flags:0x00000080
> 2021-11-25 22:12:40 [ 3322.350876] Call Trace:
> 2021-11-25 22:12:40 [ 3322.353327]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:40 [ 3322.356820]  schedule+0x3c/0xa0
> 2021-11-25 22:12:40 [ 3322.359967]  rwsem_down_read_slowpath+0x2f6/0x4a0
> 2021-11-25 22:12:40 [ 3322.364677]  check_quota_exceeded+0x64/0x220 [ceph]
> 2021-11-25 22:12:40 [ 3322.369569]  ceph_write_iter+0x1bf/0xc90 [ceph]
> 2021-11-25 22:12:40 [ 3322.374103]  ? tcp_recvmsg+0x63e/0xb80
> 2021-11-25 22:12:40 [ 3322.377854]  ? inet6_recvmsg+0x5e/0x110
> 2021-11-25 22:12:40 [ 3322.381695]  ? sock_recvmsg+0x1c/0x70
> 2021-11-25 22:12:40 [ 3322.385361]  ? new_sync_write+0x11f/0x1b0
> 2021-11-25 22:12:40 [ 3322.389370]  new_sync_write+0x11f/0x1b0
> 2021-11-25 22:12:40 [ 3322.393211]  vfs_write+0x1bd/0x270
> 2021-11-25 22:12:40 [ 3322.396617]  ksys_write+0x59/0xd0
> 2021-11-25 22:12:40 [ 3322.399937]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:40 [ 3322.403517] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:40 [ 3322.408567] RIP: 0033:0x7f8e8096a847
> 2021-11-25 22:12:40 [ 3322.412194] RSP: 002b:00007f8a071af5c0 EFLAGS: 
> 00000293 ORIG_RAX: 0000000000000001
> 2021-11-25 22:12:40 [ 3322.419769] RAX: ffffffffffffffda RBX: 
> 0000000000000c4c RCX: 00007f8e8096a847
> 2021-11-25 22:12:40 [ 3322.426902] RDX: 0000000000002000 RSI: 
> 00007f89e8303f20 RDI: 0000000000000c4c
> 2021-11-25 22:12:40 [ 3322.434030] RBP: 00007f89e8303f20 R08: 
> 0000000000000000 R09: 000000045dae1508
> 2021-11-25 22:12:40 [ 3322.441164] R10: 00000000000012b8 R11: 
> 0000000000000293 R12: 0000000000002000
> 2021-11-25 22:12:40 [ 3322.448298] R13: 00007f89e8303f20 R14: 
> 00007f8a071af650 R15: 00007f88d800b000
> 2021-11-25 22:12:40 [ 3322.455446] INFO: task P2P Transfer - :58844 
> blocked for more than 123 seconds.
> 2021-11-25 22:12:40 [ 3322.462755]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:40 [ 3322.469195] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:40 [ 3322.477021] task:P2P Transfer -  state:D stack: 
>     0 pid:58844 ppid:  4294 flags:0x00000080
> 2021-11-25 22:12:40 [ 3322.485366] Call Trace:
> 2021-11-25 22:12:40 [ 3322.487823]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:40 [ 3322.491321]  schedule+0x3c/0xa0
> 2021-11-25 22:12:40 [ 3322.494469]  rwsem_down_write_slowpath+0x2f0/0x4a0
> 2021-11-25 22:12:40 [ 3322.499261]  path_openat+0x279/0x1050
> 2021-11-25 22:12:40 [ 3322.502930]  ? task_numa_fault+0x74c/0xae0
> 2021-11-25 22:12:40 [ 3322.507036]  do_filp_open+0x93/0x100
> 2021-11-25 22:12:40 [ 3322.510615]  ? handle_mm_fault+0xb1f/0xb70
> 2021-11-25 22:12:40 [ 3322.514714]  ? __check_object_size+0x162/0x180
> 2021-11-25 22:12:40 [ 3322.519159]  do_sys_openat2+0x21e/0x2d0
> 2021-11-25 22:12:40 [ 3322.523001]  do_sys_open+0x4b/0x80
> 2021-11-25 22:12:40 [ 3322.526415]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:40 [ 3322.530003] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:40 [ 3322.535052] RIP: 0033:0x7f8e8096b0d6
> 2021-11-25 22:12:40 [ 3322.538632] RSP: 002b:00007f8a070ae640 EFLAGS: 
> 00000293 ORIG_RAX: 0000000000000101
> 2021-11-25 22:12:40 [ 3322.546201] RAX: ffffffffffffffda RBX: 
> 000000045964b830 RCX: 00007f8e8096b0d6
> 2021-11-25 22:12:40 [ 3322.553342] RDX: 00000000000000c1 RSI: 
> 00007f89d807bb40 RDI: 00000000ffffff9c
> 2021-11-25 22:12:40 [ 3322.560473] RBP: 00007f8a070ae6e0 R08: 
> 0000000000000000 R09: 000000008b2c96e6
> 2021-11-25 22:12:40 [ 3322.567606] R10: 00000000000001b6 R11: 
> 0000000000000293 R12: 00000000000001b6
> 2021-11-25 22:12:40 [ 3322.574739] R13: 00000000000000c1 R14: 
> 00007f89d807bb40 R15: 00007f88d8008b48
> 2021-11-25 22:12:40 [ 3322.581885] INFO: task vega_izum_si_03:58847 
> blocked for more than 124 seconds.
> 2021-11-25 22:12:40 [ 3322.589193]       Tainted: G           O 
>       5.10.78-2.el8.x86_64 #1
> 2021-11-25 22:12:40 [ 3322.595634] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2021-11-25 22:12:40 [ 3322.603458] task:vega_izum_si_03 state:D stack: 
>     0 pid:58847 ppid:  4294 flags:0x00000080
> 2021-11-25 22:12:40 [ 3322.611806] Call Trace:
> 2021-11-25 22:12:40 [ 3322.614262]  __schedule+0x291/0x7a0
> 2021-11-25 22:12:40 [ 3322.617768]  ? __touch_cap+0x1f/0xd0 [ceph]
> 2021-11-25 22:12:40 [ 3322.621957]  schedule+0x3c/0xa0
> 2021-11-25 22:12:40 [ 3322.625099]  rwsem_down_read_slowpath+0x2f6/0x4a0
> 2021-11-25 22:12:40 [ 3322.629806]  ? lookup_fast+0xae/0x150
> 2021-11-25 22:12:40 [ 3322.633472]  walk_component+0x129/0x1b0
> 2021-11-25 22:12:40 [ 3322.637315]  ? path_init+0x2ef/0x360
> 2021-11-25 22:12:40 [ 3322.640902]  path_lookupat.isra.42+0x67/0x140
> 2021-11-25 22:12:40 [ 3322.645258]  filename_lookup.part.56+0xa0/0x170
> 2021-11-25 22:12:40 [ 3322.649793]  ? __check_object_size+0x162/0x180
> 2021-11-25 22:12:40 [ 3322.654238]  ? strncpy_from_user+0x46/0x1e0
> 2021-11-25 22:12:40 [ 3322.658422]  vfs_statx+0x72/0x110
> 2021-11-25 22:12:40 [ 3322.661740]  __do_sys_newstat+0x39/0x70
> 2021-11-25 22:12:40 [ 3322.665584]  ? 
> syscall_trace_enter.isra.19+0x123/0x190
> 2021-11-25 22:12:40 [ 3322.670722]  do_syscall_64+0x33/0x40
> 2021-11-25 22:12:40 [ 3322.674304] 
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 2021-11-25 22:12:40 [ 3322.679375] RIP: 0033:0x7f8e8026ba79
> 2021-11-25 22:12:40 [ 3322.682949] RSP: 002b:00007f8a05d9d048 EFLAGS: 
> 00000246 ORIG_RAX: 0000000000000004
> 2021-11-25 22:12:40 [ 3322.690517] RAX: ffffffffffffffda RBX: 
> 00007f8a05d9d050 RCX: 00007f8e8026ba79
> 2021-11-25 22:12:40 [ 3322.697650] RDX: 00007f8a05d9d050 RSI: 
> 00007f8a05d9d050 RDI: 00007f8900018220
> 2021-11-25 22:12:40 [ 3322.704783] RBP: 00007f8a05d9d100 R08: 
> 0000000000000000 R09: 0000000459723280
> 2021-11-25 22:12:40 [ 3322.711917] R10: 00007f8e687103a5 R11: 
> 0000000000000246 R12: 00007f8900018220
> 2021-11-25 22:12:40 [ 3322.719045] R13: 00007f8c540c7b48 R14: 
> 00007f8a05d9d118 R15: 00007f8c540c7800
> 2021-11-25 22:13:46 [ 3388.045080] ceph: mds0 hung
> 
> 
> -- 
> _____________________________________________________________
>     prof. dr. Andrej Filipcic,   E-mail:Andrej.Filipcic@xxxxxx
>     Department of Experimental High Energy Physics - F9
>     Jozef Stefan Institute, Jamova 39, P.o.Box 3000
>     SI-1001 Ljubljana, Slovenia
>     Tel.: +386-1-477-3674    Fax: +386-1-477-3166
> -------------------------------------------------------------
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Jeff Layton <jlayton@xxxxxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux