On Wed, Sep 13, 2023 at 4:49 PM Stefan Kooman <stefan@xxxxxx> wrote: > > On 13-09-2023 14:58, Ilya Dryomov wrote: > > On Wed, Sep 13, 2023 at 9:20 AM Stefan Kooman <stefan@xxxxxx> wrote: > >> > >> Hi, > >> > >> Since the 6.5 kernel addressed the issue with regards to regression in > >> the readahead handling code... we went ahead and installed this kernel > >> for a couple of mail / web clusters (Ubuntu 6.5.1-060501-generic > >> #202309020842 SMP PREEMPT_DYNAMIC Sat Sep 2 08:48:34 UTC 2023 x86_64 > >> x86_64 x86_64 GNU/Linux). Since then we occasionally see the following > >> being logged by the kernel: > >> > >> [Sun Sep 10 07:19:00 2023] workqueue: delayed_work [ceph] hogged CPU for > >> >10000us 4 times, consider switching to WQ_UNBOUND > >> [Sun Sep 10 08:41:24 2023] workqueue: ceph_con_workfn [libceph] hogged > >> CPU for >10000us 4 times, consider switching to WQ_UNBOUND > >> [Sun Sep 10 11:05:55 2023] workqueue: delayed_work [ceph] hogged CPU for > >> >10000us 8 times, consider switching to WQ_UNBOUND > >> [Sun Sep 10 12:54:38 2023] workqueue: ceph_con_workfn [libceph] hogged > >> CPU for >10000us 8 times, consider switching to WQ_UNBOUND > >> [Sun Sep 10 19:06:37 2023] workqueue: ceph_con_workfn [libceph] hogged > >> CPU for >10000us 16 times, consider switching to WQ_UNBOUND > >> [Mon Sep 11 10:53:33 2023] workqueue: ceph_con_workfn [libceph] hogged > >> CPU for >10000us 32 times, consider switching to WQ_UNBOUND > >> [Tue Sep 12 10:14:03 2023] workqueue: ceph_con_workfn [libceph] hogged > >> CPU for >10000us 64 times, consider switching to WQ_UNBOUND > >> [Tue Sep 12 11:14:33 2023] workqueue: ceph_cap_reclaim_work [ceph] > >> hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND > >> > >> We wonder if this is a new phenomenon, or that it's rather logged in the > >> new kernel and it was not before. > > > > Hi Stefan, > > > > This is something that wasn't logged in older kernels. The kernel > > workqueue infrastructure is considering Ceph work items CPU intensive > > and reports that in dmesg. This is new in 6.5 kernel, the threshold > > can be tweaked with workqueue.cpu_intensive_thresh_us parameter. > > Thanks. I was just looking into it (WQ_UNBOUND), alloc_workqueue(), etc. > The patch by Tejun Heo on workqueue also mentions this: > > * Concurrency-managed per-cpu work items that hog CPUs and delay the > execution of other work items are now automatically detected and > excluded from concurrency management. Reporting on such work items can > also be enabled through a config option. > > This does imply that the Ceph work items are "excluded from concurrency > management", is that correct? And if so, what does that mean in > practice? Might this make the process of returning / claiming caps to > the MDS slower? I haven't had the time to look into this in detail, but I did see a couple of recent changes in the area which I tagged as "something of interest" in my inbox: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c25da5b7baf1d243e6612ba2b97e2a2c4a1376f6 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f959325e6ac3f499450088b8d9c626d1177be160 Thanks, Ilya _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx