Re: 6.5 CephFS client - ceph_cap_reclaim_work [ceph] / ceph_con_workfn [libceph] hogged CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 14-09-2023 10:03, Xiubo Li wrote:

<----- snip ----->

Okay, there were caps revocation stuck, which could cause the create/lookup/getattr requests stuck and then reported as slow requests.

This should be a known issue as I am now working on Currently I found one case, which is the unlinking, could cause it, and these warning should disappeared finally after the kclient could release the caps later, more detail please see my comments on this tracker.

This tracker not finishes yet and there could be other cases that could cause it, and I will continue working on it this week and next.

Ah, well, in that case, you're in for a treat ;-). We have a mail cluster (dovecot on CephFS) that is spamming us continuously (during work hours) with this. We could send you (a lot of) logs if you want to have more examples of production setups. This behavior has not changed between all the kernels we have tried (currently 6.5). Shall I upload them to tracker 50223?

Would be great to get this fixed. Let us know if you need us to test anything.

On a side note: separate metrics for the kclient for "# acquired caps" and # released caps" might be useful as well. As a kclient can both be very busy releasing caps, but also acquiring new ones. Just a total # caps does not give this insight.


Gr. Stefan
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux