Re: 6.5 CephFS client - ceph_cap_reclaim_work [ceph] / ceph_con_workfn [libceph] hogged CPU

Xiubo Li <xiubli@xxxxxxxxxx> · Thu, 14 Sep 2023 09:27:15 +0800

On 9/13/23 20:58, Ilya Dryomov wrote:
On Wed, Sep 13, 2023 at 9:20 AM Stefan Kooman <stefan@xxxxxx> wrote:
Hi,

Since the 6.5 kernel addressed the issue with regards to regression in
the readahead handling code... we went ahead and installed this kernel
for a couple of mail / web clusters (Ubuntu 6.5.1-060501-generic
#202309020842 SMP PREEMPT_DYNAMIC Sat Sep  2 08:48:34 UTC 2023 x86_64
x86_64 x86_64 GNU/Linux). Since then we occasionally see the following
being logged by the kernel:

[Sun Sep 10 07:19:00 2023] workqueue: delayed_work [ceph] hogged CPU for
   >10000us 4 times, consider switching to WQ_UNBOUND
[Sun Sep 10 08:41:24 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 4 times, consider switching to WQ_UNBOUND
[Sun Sep 10 11:05:55 2023] workqueue: delayed_work [ceph] hogged CPU for
   >10000us 8 times, consider switching to WQ_UNBOUND
[Sun Sep 10 12:54:38 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[Sun Sep 10 19:06:37 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 16 times, consider switching to WQ_UNBOUND
[Mon Sep 11 10:53:33 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 32 times, consider switching to WQ_UNBOUND
[Tue Sep 12 10:14:03 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 64 times, consider switching to WQ_UNBOUND
[Tue Sep 12 11:14:33 2023] workqueue: ceph_cap_reclaim_work [ceph]
hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND

We wonder if this is a new phenomenon, or that it's rather logged in the
new kernel and it was not before.
Hi Stefan,

This is something that wasn't logged in older kernels.  The kernel
workqueue infrastructure is considering Ceph work items CPU intensive
and reports that in dmesg.  This is new in 6.5 kernel, the threshold
can be tweaked with workqueue.cpu_intensive_thresh_us parameter.

Hi Stefan,

Yeah, as I remembered before I have seen something like this only once 
in the cephfs qa tests together with other issues, but I just thought it 
wasn't the root cause so I didn't spent time on it.

Just went through the kernel ceph code, such as for 
'ceph_cap_reclaim_work' workqueue, it will take the spin lock and if 
there are too many dentries and directories need to be interated it may 
hog the cpu for a long time. Anyway this can be improved.

However, we have hit a few OOM situations since we switched to the new
kernel because of ceph_cap_reclaim_work events (OOM is because Apache
threads keep piling up as it cannot access CephFS). We then also see MDS
slow ops reported. This might be related to a backup job that is running
on a backup server. We did not observe this behavior on 5.12.19 kernel.

We haven't touch the reclaim related code for a long time. I just 
suspect the backup slowed down the perf.

BTW, do you have the MDS logs about the slow requests ? What are they ?

Thanks

- Xiubo

Adding Xiubo.

Thanks,

                 Ilya

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx