6.5 CephFS client - ceph_cap_reclaim_work [ceph] / ceph_con_workfn [libceph] hogged CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Since the 6.5 kernel addressed the issue with regards to regression in the readahead handling code... we went ahead and installed this kernel for a couple of mail / web clusters (Ubuntu 6.5.1-060501-generic #202309020842 SMP PREEMPT_DYNAMIC Sat Sep 2 08:48:34 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux). Since then we occasionally see the following being logged by the kernel:

[Sun Sep 10 07:19:00 2023] workqueue: delayed_work [ceph] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND [Sun Sep 10 08:41:24 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND [Sun Sep 10 11:05:55 2023] workqueue: delayed_work [ceph] hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND [Sun Sep 10 12:54:38 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND [Sun Sep 10 19:06:37 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND [Mon Sep 11 10:53:33 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 32 times, consider switching to WQ_UNBOUND [Tue Sep 12 10:14:03 2023] workqueue: ceph_con_workfn [libceph] hogged CPU for >10000us 64 times, consider switching to WQ_UNBOUND [Tue Sep 12 11:14:33 2023] workqueue: ceph_cap_reclaim_work [ceph] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND

We wonder if this is a new phenomenon, or that it's rather logged in the new kernel and it was not before.

However, we have hit a few OOM situations since we switched to the new kernel because of ceph_cap_reclaim_work events (OOM is because Apache threads keep piling up as it cannot access CephFS). We then also see MDS slow ops reported. This might be related to a backup job that is running on a backup server. We did not observe this behavior on 5.12.19 kernel.

Ceph cluster is on 16.2.11 currently.

Anyone has some insight on this?

Thanks,

Stefan


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux