On Mon, 2020-04-06 at 12:23 +0200, Jesper Krogh wrote: > It also emits this one - quite a bit later (not sure thats relevant) > > [43364.349446] kworker/3:6: page allocation failure: order:0, > mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 It failed to allocate a single page (order:0) to handle a receive. This is a GFP_ATOMIC allocation which means that it's not allowed to sleep (and thus can't do any active reclaim). > [43364.349459] CPU: 3 PID: 23433 Comm: kworker/3:6 Not tainted 5.4.30 #5 > [43364.349460] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [43364.349515] Workqueue: ceph-msgr ceph_con_workfn [libceph] > [43364.349516] Call Trace: > [43364.349523] <IRQ> > [43364.349594] dump_stack+0x6d/0x95 > [43364.349614] warn_alloc+0x10c/0x170 > [43364.349616] __alloc_pages_slowpath+0xe6c/0xef0 > [43364.349634] ? ip_local_deliver_finish+0x48/0x50 > [43364.349635] ? ip_local_deliver+0x6f/0xe0 > [43364.349640] ? tcp_v4_early_demux+0x11c/0x150 > [43364.349641] __alloc_pages_nodemask+0x2f3/0x360 > [43364.349643] page_frag_alloc+0x129/0x150 > [43364.349654] __napi_alloc_skb+0x86/0xd0 > [43364.349670] page_to_skb+0x67/0x350 [virtio_net] > [43364.349672] receive_buf+0xe47/0x16c0 [virtio_net] > [43364.349675] virtnet_poll+0xf2/0x364 [virtio_net] > [43364.349679] net_rx_action+0x265/0x3e0 > [43364.349691] __do_softirq+0xf9/0x2aa > [43364.349701] irq_exit+0xae/0xb0 > [43364.349705] do_IRQ+0x59/0xe0 > [43364.349706] common_interrupt+0xf/0xf > [43364.349707] </IRQ> > [43364.349713] RIP: 0010:kvm_clock_get_cycles+0xc/0x20 > [43364.349715] Code: c3 48 c1 e1 06 31 c0 48 81 c1 00 10 60 84 49 89 > 0c 10 eb be 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 65 48 8b 3d e4 > 8f 7a 7d <e8> ff 0f 00 00 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 > 55 48 > [43364.349716] RSP: 0018:ffffc20941f4bab8 EFLAGS: 00000246 ORIG_RAX: > ffffffffffffffdb > [43364.349717] RAX: ffffffff8286e040 RBX: 0000277062955ccd RCX: 000000000000056a > [43364.349735] RDX: 0000000000000000 RSI: 0000000000001000 RDI: ffffffff846010c0 > [43364.349750] RBP: ffffc20941f4bab8 R08: 0000000000008f3e R09: 000000000000097a > [43364.349751] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000000000000 > [43364.349751] R13: 00000000004a63a2 R14: ffff9c1ac83665b4 R15: 0000000000000000 > [43364.349753] ? kvmclock_setup_percpu+0x80/0x80 > [43364.349760] ktime_get+0x3e/0xa0 > [43364.349763] tcp_mstamp_refresh+0x12/0x40 > [43364.349764] tcp_rcv_space_adjust+0x22/0x1d0 > [43364.349766] tcp_recvmsg+0x28b/0xbc0 > [43364.349777] ? aa_sk_perm+0x43/0x190 > [43364.349781] inet_recvmsg+0x64/0xf0 > [43364.349786] sock_recvmsg+0x66/0x70 > [43364.349791] ceph_tcp_recvpage+0x79/0xb0 [libceph] > [43364.349796] read_partial_message+0x3c3/0x7c0 [libceph] > [43364.349801] ceph_con_workfn+0xa6a/0x23d0 [libceph] > [43364.349809] process_one_work+0x167/0x400 > [43364.349810] worker_thread+0x4d/0x460 > [43364.349814] kthread+0x105/0x140 > [43364.349815] ? rescuer_thread+0x370/0x370 > [43364.349816] ? kthread_destroy_worker+0x50/0x50 > [43364.349817] ret_from_fork+0x35/0x40 > [43364.349845] Mem-Info: > [43364.349849] active_anon:10675 inactive_anon:12484 isolated_anon:0 > active_file:14532 inactive_file:3900471 isolated_file:8 > unevictable:0 dirty:1 writeback:0 unstable:0 > slab_reclaimable:68339 slab_unreclaimable:36220 > mapped:11089 shmem:608 pagetables:1093 bounce:0 > free:25551 free_pcp:3443 free_cma:0 > [43364.349851] Node 0 active_anon:42700kB inactive_anon:49936kB > active_file:58128kB inactive_file:15601884kB unevictable:0kB > isolated(anon):0kB isolated(file):32kB mapped:44356kB dirty:4kB > writeback:0kB shmem:2432kB shmem_thp: 0kB shmem_pmdmapped: 0kB > anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no > [43364.349852] Node 0 DMA free:15908kB min:64kB low:80kB high:96kB > active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB > unevictable:0kB writepending:0kB present:15992kB managed:15908kB > mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB > local_pcp:0kB free_cma:0kB > [43364.349857] lowmem_reserve[]: 0 3444 15930 15930 15930 > [43364.349859] Node 0 DMA32 free:55240kB min:14596kB low:18244kB > high:21892kB active_anon:1556kB inactive_anon:188kB active_file:48kB > inactive_file:3430152kB unevictable:0kB writepending:0kB > present:3653608kB managed:3588072kB mlocked:0kB kernel_stack:0kB > pagetables:0kB bounce:0kB free_pcp:3568kB local_pcp:248kB free_cma:0kB > [43364.349861] lowmem_reserve[]: 0 0 12485 12485 12485 > [43364.349862] Node 0 Normal free:31056kB min:83636kB low:96864kB > high:110092kB active_anon:41144kB inactive_anon:49748kB > active_file:58080kB inactive_file:12171732kB unevictable:0kB > writepending:4kB present:13107200kB managed:12793540kB mlocked:0kB > kernel_stack:7472kB pagetables:4372kB bounce:0kB free_pcp:10204kB > local_pcp:1456kB free_cma:0kB > [43364.349864] lowmem_reserve[]: 0 0 0 0 0 > [43364.349865] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB > (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB > (M) = 15908kB > [43364.349877] Node 0 DMA32: 22*4kB (EH) 18*8kB (UMEH) 0*16kB 13*32kB > (UME) 17*64kB (UM) 18*128kB (UMEH) 14*256kB (UEH) 5*512kB (UE) > 24*1024kB (UM) 10*2048kB (M) 0*4096kB = 55240kB > [43364.349894] Node 0 Normal: 94*4kB (UMEH) 47*8kB (MEH) 60*16kB > (UMEH) 38*32kB (UMEH) 30*64kB (MEH) 21*128kB (MEH) 37*256kB (UM) > 28*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 31344kB Here though, it looks like there is quite a bit of free memory, so I'm not sure why the allocation failed. It's possible that at the time that the allocation failed there was nothing free, but just afterward quite a bit of memory did become free (before the kernel could print out the summary). It may also be indicative of something else being wrong. If this happened much later, then it may just be fallout from whatever the original problem was. > [43364.349920] Node 0 hugepages_total=0 hugepages_free=0 > hugepages_surp=0 hugepages_size=1048576kB > [43364.349927] Node 0 hugepages_total=0 hugepages_free=0 > hugepages_surp=0 hugepages_size=2048kB > [43364.349928] 3915698 total pagecache pages > [43364.349930] 79 pages in swap cache > [43364.349931] Swap cache stats: add 393, delete 314, find 38/80 > [43364.349932] Free swap = 4036080kB > [43364.349932] Total swap = 4038652kB > [43364.349933] 4194200 pages RAM > [43364.349933] 0 pages HighMem/MovableOnly > [43364.349934] 94820 pages reserved > [43364.349934] 0 pages cma reserved > [43364.349934] 0 pages hwpoisoned > > On Mon, Apr 6, 2020 at 11:18 AM Jesper Krogh <jesper.krogh@xxxxxxxxx> wrote: > > And it looks like I can reproduce this pattern - and when it has been > > stuck for "sufficient" amount of time - then it gets blacklisted by > > the MDS/OSD and (a related) issue is that I cannot get the mountpoint > > back without a reboot on the kernel-client side. > > > > On Mon, Apr 6, 2020 at 11:09 AM Jesper Krogh <jesper.krogh@xxxxxxxxx> wrote: > > > home-directory style - median is about 3KB - but varies greatly. > > > > > > I also get this: > > > [41204.865818] INFO: task kworker/u16:102:21903 blocked for more than > > > 120 seconds. > > > [41204.865955] Not tainted 5.4.30 #5 > > > [41204.866006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > disables this message. > > > [41204.866056] kworker/u16:102 D 0 21903 2 0x80004000 > > > [41204.866119] Workqueue: ceph-inode ceph_inode_work [ceph] > > > [41204.866120] Call Trace: > > > [41204.866156] __schedule+0x45f/0x710 > > > [41204.866162] ? xas_store+0x391/0x5f0 > > > [41204.866164] schedule+0x3e/0xa0 > > > [41204.866166] io_schedule+0x16/0x40 > > > [41204.866180] __lock_page+0x12a/0x1d0 > > > [41204.866182] ? file_fdatawait_range+0x30/0x30 > > > [41204.866187] truncate_inode_pages_range+0x52c/0x980 > > > [41204.866191] ? syscall_return_via_sysret+0x12/0x7f > > > [41204.866197] ? drop_inode_snap_realm+0x98/0xa0 [ceph] > > > [41204.866207] ? fsnotify_grab_connector+0x4d/0x90 > > > [41204.866209] truncate_inode_pages_final+0x4c/0x60 > > > [41204.866214] ceph_evict_inode+0x2d/0x210 [ceph] > > > [41204.866219] evict+0xca/0x1a0 > > > [41204.866221] iput+0x1ba/0x210 > > > [41204.866225] ceph_inode_work+0x40/0x270 [ceph] > > > [41204.866232] process_one_work+0x167/0x400 > > > [41204.866233] worker_thread+0x4d/0x460 > > > [41204.866236] kthread+0x105/0x140 > > > [41204.866237] ? rescuer_thread+0x370/0x370 > > > [41204.866239] ? kthread_destroy_worker+0x50/0x50 > > > [41204.866240] ret_from_fork+0x35/0x40 > > > > > > On Mon, Apr 6, 2020 at 10:53 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote: > > > > On Mon, Apr 6, 2020 at 4:06 PM Jesper Krogh <jesper.krogh@xxxxxxxxx> wrote: > > > > > This is a CephFS client - its only purpose is to run the "filedaemon" of bacula > > > > > and transport data to the tape-library from CephFS - below 2 threads is > > > > > essentially doing something equivalent to > > > > > > > > > > find /cephfs/ -type f | xargs cat | nc server > > > > > > > > > > 2 threads only, load exploding and the "net read vs net write" has > > > > > more than 100x difference. > > > > > > > > > > Can anyone explain this as "normal" behaviour? > > > > > Server is a VM with 16 "vCPU" and 16GB memory running libvirt/qemu > > > > > > > > > > jk@wombat:~$ w > > > > > 07:50:33 up 11:25, 1 user, load average: 206.43, 76.23, 50.58 > > > > > USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT > > > > > jk pts/0 10.194.133.42 06:54 0.00s 0.05s 0.00s w > > > > > jk@wombat:~$ dstat -ar > > > > > --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- --io/total- > > > > > usr sys idl wai stl| read writ| recv send| in out | int csw | read writ > > > > > 0 0 98 1 0| 14k 34k| 0 0 | 3B 27B| 481 294 |0.55 0.73 > > > > > 1 1 0 98 0| 0 0 | 60M 220k| 0 0 |6402 6182 | 0 0 > > > > > 0 1 0 98 0| 0 0 | 69M 255k| 0 0 |7305 4339 | 0 0 > > > > > 1 2 0 98 0| 0 0 | 76M 282k| 0 0 |7914 4886 | 0 0 > > > > > 1 1 0 99 0| 0 0 | 70M 260k| 0 0 |7293 4444 | 0 0 > > > > > 1 1 0 98 0| 0 0 | 80M 278k| 0 0 |8018 4931 | 0 0 > > > > > 0 1 0 98 0| 0 0 | 60M 221k| 0 0 |6435 5951 | 0 0 > > > > > 0 1 0 99 0| 0 0 | 59M 211k| 0 0 |6163 3584 | 0 0 > > > > > 0 1 0 98 0| 0 0 | 64M 323k| 0 0 |6653 3881 | 0 0 > > > > > 1 0 0 99 0| 0 0 | 61M 243k| 0 0 |6822 4401 | 0 0 > > > > > 0 1 0 99 0| 0 0 | 55M 205k| 0 0 |5975 3518 | 0 0 > > > > > 1 1 0 98 0| 0 0 | 68M 242k| 0 0 |7094 6544 | 0 0 > > > > > 0 1 0 99 0| 0 0 | 58M 230k| 0 0 |6639 4178 | 0 0 > > > > > 1 2 0 98 0| 0 0 | 61M 243k| 0 0 |7117 4477 | 0 0 > > > > > 0 1 0 99 0| 0 0 | 61M 228k| 0 0 |6500 4078 | 0 0 > > > > > 0 1 0 99 0| 0 0 | 65M 234k| 0 0 |6595 3914 | 0 0 > > > > > 0 1 0 98 0| 0 0 | 64M 219k| 0 0 |6507 5755 | 0 0 > > > > > 1 1 0 99 0| 0 0 | 64M 233k| 0 0 |6869 4153 | 0 0 > > > > > 1 2 0 98 0| 0 0 | 63M 232k| 0 0 |6632 3907 | > > > > > 0 0 ^C > > > > > jk@wombat:~$ w > > > > > 07:50:56 up 11:25, 1 user, load average: 221.35, 88.07, 55.02 > > > > > USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT > > > > > jk pts/0 10.194.133.42 06:54 0.00s 0.05s 0.00s w > > > > > jk@wombat:~$ > > > > > > > > > > Thanks. > > > > > > > > how small these files are? -- Jeff Layton <jlayton@xxxxxxxxxx>