It also emits this one - quite a bit later (not sure thats relevant) [43364.349446] kworker/3:6: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 [43364.349459] CPU: 3 PID: 23433 Comm: kworker/3:6 Not tainted 5.4.30 #5 [43364.349460] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [43364.349515] Workqueue: ceph-msgr ceph_con_workfn [libceph] [43364.349516] Call Trace: [43364.349523] <IRQ> [43364.349594] dump_stack+0x6d/0x95 [43364.349614] warn_alloc+0x10c/0x170 [43364.349616] __alloc_pages_slowpath+0xe6c/0xef0 [43364.349634] ? ip_local_deliver_finish+0x48/0x50 [43364.349635] ? ip_local_deliver+0x6f/0xe0 [43364.349640] ? tcp_v4_early_demux+0x11c/0x150 [43364.349641] __alloc_pages_nodemask+0x2f3/0x360 [43364.349643] page_frag_alloc+0x129/0x150 [43364.349654] __napi_alloc_skb+0x86/0xd0 [43364.349670] page_to_skb+0x67/0x350 [virtio_net] [43364.349672] receive_buf+0xe47/0x16c0 [virtio_net] [43364.349675] virtnet_poll+0xf2/0x364 [virtio_net] [43364.349679] net_rx_action+0x265/0x3e0 [43364.349691] __do_softirq+0xf9/0x2aa [43364.349701] irq_exit+0xae/0xb0 [43364.349705] do_IRQ+0x59/0xe0 [43364.349706] common_interrupt+0xf/0xf [43364.349707] </IRQ> [43364.349713] RIP: 0010:kvm_clock_get_cycles+0xc/0x20 [43364.349715] Code: c3 48 c1 e1 06 31 c0 48 81 c1 00 10 60 84 49 89 0c 10 eb be 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 65 48 8b 3d e4 8f 7a 7d <e8> ff 0f 00 00 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 [43364.349716] RSP: 0018:ffffc20941f4bab8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb [43364.349717] RAX: ffffffff8286e040 RBX: 0000277062955ccd RCX: 000000000000056a [43364.349735] RDX: 0000000000000000 RSI: 0000000000001000 RDI: ffffffff846010c0 [43364.349750] RBP: ffffc20941f4bab8 R08: 0000000000008f3e R09: 000000000000097a [43364.349751] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000000000000 [43364.349751] R13: 00000000004a63a2 R14: ffff9c1ac83665b4 R15: 0000000000000000 [43364.349753] ? kvmclock_setup_percpu+0x80/0x80 [43364.349760] ktime_get+0x3e/0xa0 [43364.349763] tcp_mstamp_refresh+0x12/0x40 [43364.349764] tcp_rcv_space_adjust+0x22/0x1d0 [43364.349766] tcp_recvmsg+0x28b/0xbc0 [43364.349777] ? aa_sk_perm+0x43/0x190 [43364.349781] inet_recvmsg+0x64/0xf0 [43364.349786] sock_recvmsg+0x66/0x70 [43364.349791] ceph_tcp_recvpage+0x79/0xb0 [libceph] [43364.349796] read_partial_message+0x3c3/0x7c0 [libceph] [43364.349801] ceph_con_workfn+0xa6a/0x23d0 [libceph] [43364.349809] process_one_work+0x167/0x400 [43364.349810] worker_thread+0x4d/0x460 [43364.349814] kthread+0x105/0x140 [43364.349815] ? rescuer_thread+0x370/0x370 [43364.349816] ? kthread_destroy_worker+0x50/0x50 [43364.349817] ret_from_fork+0x35/0x40 [43364.349845] Mem-Info: [43364.349849] active_anon:10675 inactive_anon:12484 isolated_anon:0 active_file:14532 inactive_file:3900471 isolated_file:8 unevictable:0 dirty:1 writeback:0 unstable:0 slab_reclaimable:68339 slab_unreclaimable:36220 mapped:11089 shmem:608 pagetables:1093 bounce:0 free:25551 free_pcp:3443 free_cma:0 [43364.349851] Node 0 active_anon:42700kB inactive_anon:49936kB active_file:58128kB inactive_file:15601884kB unevictable:0kB isolated(anon):0kB isolated(file):32kB mapped:44356kB dirty:4kB writeback:0kB shmem:2432kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [43364.349852] Node 0 DMA free:15908kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [43364.349857] lowmem_reserve[]: 0 3444 15930 15930 15930 [43364.349859] Node 0 DMA32 free:55240kB min:14596kB low:18244kB high:21892kB active_anon:1556kB inactive_anon:188kB active_file:48kB inactive_file:3430152kB unevictable:0kB writepending:0kB present:3653608kB managed:3588072kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:3568kB local_pcp:248kB free_cma:0kB [43364.349861] lowmem_reserve[]: 0 0 12485 12485 12485 [43364.349862] Node 0 Normal free:31056kB min:83636kB low:96864kB high:110092kB active_anon:41144kB inactive_anon:49748kB active_file:58080kB inactive_file:12171732kB unevictable:0kB writepending:4kB present:13107200kB managed:12793540kB mlocked:0kB kernel_stack:7472kB pagetables:4372kB bounce:0kB free_pcp:10204kB local_pcp:1456kB free_cma:0kB [43364.349864] lowmem_reserve[]: 0 0 0 0 0 [43364.349865] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB [43364.349877] Node 0 DMA32: 22*4kB (EH) 18*8kB (UMEH) 0*16kB 13*32kB (UME) 17*64kB (UM) 18*128kB (UMEH) 14*256kB (UEH) 5*512kB (UE) 24*1024kB (UM) 10*2048kB (M) 0*4096kB = 55240kB [43364.349894] Node 0 Normal: 94*4kB (UMEH) 47*8kB (MEH) 60*16kB (UMEH) 38*32kB (UMEH) 30*64kB (MEH) 21*128kB (MEH) 37*256kB (UM) 28*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 31344kB [43364.349920] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [43364.349927] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [43364.349928] 3915698 total pagecache pages [43364.349930] 79 pages in swap cache [43364.349931] Swap cache stats: add 393, delete 314, find 38/80 [43364.349932] Free swap = 4036080kB [43364.349932] Total swap = 4038652kB [43364.349933] 4194200 pages RAM [43364.349933] 0 pages HighMem/MovableOnly [43364.349934] 94820 pages reserved [43364.349934] 0 pages cma reserved [43364.349934] 0 pages hwpoisoned On Mon, Apr 6, 2020 at 11:18 AM Jesper Krogh <jesper.krogh@xxxxxxxxx> wrote: > > And it looks like I can reproduce this pattern - and when it has been > stuck for "sufficient" amount of time - then it gets blacklisted by > the MDS/OSD and (a related) issue is that I cannot get the mountpoint > back without a reboot on the kernel-client side. > > On Mon, Apr 6, 2020 at 11:09 AM Jesper Krogh <jesper.krogh@xxxxxxxxx> wrote: > > > > home-directory style - median is about 3KB - but varies greatly. > > > > I also get this: > > [41204.865818] INFO: task kworker/u16:102:21903 blocked for more than > > 120 seconds. > > [41204.865955] Not tainted 5.4.30 #5 > > [41204.866006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [41204.866056] kworker/u16:102 D 0 21903 2 0x80004000 > > [41204.866119] Workqueue: ceph-inode ceph_inode_work [ceph] > > [41204.866120] Call Trace: > > [41204.866156] __schedule+0x45f/0x710 > > [41204.866162] ? xas_store+0x391/0x5f0 > > [41204.866164] schedule+0x3e/0xa0 > > [41204.866166] io_schedule+0x16/0x40 > > [41204.866180] __lock_page+0x12a/0x1d0 > > [41204.866182] ? file_fdatawait_range+0x30/0x30 > > [41204.866187] truncate_inode_pages_range+0x52c/0x980 > > [41204.866191] ? syscall_return_via_sysret+0x12/0x7f > > [41204.866197] ? drop_inode_snap_realm+0x98/0xa0 [ceph] > > [41204.866207] ? fsnotify_grab_connector+0x4d/0x90 > > [41204.866209] truncate_inode_pages_final+0x4c/0x60 > > [41204.866214] ceph_evict_inode+0x2d/0x210 [ceph] > > [41204.866219] evict+0xca/0x1a0 > > [41204.866221] iput+0x1ba/0x210 > > [41204.866225] ceph_inode_work+0x40/0x270 [ceph] > > [41204.866232] process_one_work+0x167/0x400 > > [41204.866233] worker_thread+0x4d/0x460 > > [41204.866236] kthread+0x105/0x140 > > [41204.866237] ? rescuer_thread+0x370/0x370 > > [41204.866239] ? kthread_destroy_worker+0x50/0x50 > > [41204.866240] ret_from_fork+0x35/0x40 > > > > On Mon, Apr 6, 2020 at 10:53 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote: > > > > > > On Mon, Apr 6, 2020 at 4:06 PM Jesper Krogh <jesper.krogh@xxxxxxxxx> wrote: > > > > > > > > This is a CephFS client - its only purpose is to run the "filedaemon" of bacula > > > > and transport data to the tape-library from CephFS - below 2 threads is > > > > essentially doing something equivalent to > > > > > > > > find /cephfs/ -type f | xargs cat | nc server > > > > > > > > 2 threads only, load exploding and the "net read vs net write" has > > > > more than 100x difference. > > > > > > > > Can anyone explain this as "normal" behaviour? > > > > Server is a VM with 16 "vCPU" and 16GB memory running libvirt/qemu > > > > > > > > jk@wombat:~$ w > > > > 07:50:33 up 11:25, 1 user, load average: 206.43, 76.23, 50.58 > > > > USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT > > > > jk pts/0 10.194.133.42 06:54 0.00s 0.05s 0.00s w > > > > jk@wombat:~$ dstat -ar > > > > --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- --io/total- > > > > usr sys idl wai stl| read writ| recv send| in out | int csw | read writ > > > > 0 0 98 1 0| 14k 34k| 0 0 | 3B 27B| 481 294 |0.55 0.73 > > > > 1 1 0 98 0| 0 0 | 60M 220k| 0 0 |6402 6182 | 0 0 > > > > 0 1 0 98 0| 0 0 | 69M 255k| 0 0 |7305 4339 | 0 0 > > > > 1 2 0 98 0| 0 0 | 76M 282k| 0 0 |7914 4886 | 0 0 > > > > 1 1 0 99 0| 0 0 | 70M 260k| 0 0 |7293 4444 | 0 0 > > > > 1 1 0 98 0| 0 0 | 80M 278k| 0 0 |8018 4931 | 0 0 > > > > 0 1 0 98 0| 0 0 | 60M 221k| 0 0 |6435 5951 | 0 0 > > > > 0 1 0 99 0| 0 0 | 59M 211k| 0 0 |6163 3584 | 0 0 > > > > 0 1 0 98 0| 0 0 | 64M 323k| 0 0 |6653 3881 | 0 0 > > > > 1 0 0 99 0| 0 0 | 61M 243k| 0 0 |6822 4401 | 0 0 > > > > 0 1 0 99 0| 0 0 | 55M 205k| 0 0 |5975 3518 | 0 0 > > > > 1 1 0 98 0| 0 0 | 68M 242k| 0 0 |7094 6544 | 0 0 > > > > 0 1 0 99 0| 0 0 | 58M 230k| 0 0 |6639 4178 | 0 0 > > > > 1 2 0 98 0| 0 0 | 61M 243k| 0 0 |7117 4477 | 0 0 > > > > 0 1 0 99 0| 0 0 | 61M 228k| 0 0 |6500 4078 | 0 0 > > > > 0 1 0 99 0| 0 0 | 65M 234k| 0 0 |6595 3914 | 0 0 > > > > 0 1 0 98 0| 0 0 | 64M 219k| 0 0 |6507 5755 | 0 0 > > > > 1 1 0 99 0| 0 0 | 64M 233k| 0 0 |6869 4153 | 0 0 > > > > 1 2 0 98 0| 0 0 | 63M 232k| 0 0 |6632 3907 | > > > > 0 0 ^C > > > > jk@wombat:~$ w > > > > 07:50:56 up 11:25, 1 user, load average: 221.35, 88.07, 55.02 > > > > USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT > > > > jk pts/0 10.194.133.42 06:54 0.00s 0.05s 0.00s w > > > > jk@wombat:~$ > > > > > > > > Thanks. > > > > > > how small these files are?