Am 23.01.24 um 13:09 schrieb Ilya Dryomov:
On Tue, Jan 23, 2024 at 7:59 AM Amon Ott <lists@xxxxxxxxxxxxxxx> wrote:
Am 23.01.24 um 05:58 schrieb Venky Shankar:
The trackers you mention would be a part of the next quincy and reef releases.
Thank you! For now, the systems seem to run fine with Linux kernel 5.10
on the client side, so we have a workaround. The problems started after
we changed to 6.1 - maybe that info helps identifying what triggered
these bugs.
Is any of your CephFS pools nearfull? If so, you are likely affected
by [1] and would need to add additional capacity or bump the nearfull
threshold.
[1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/E2QJ6K4UBYFM2RNRHHCX5CABYMDESNTC/
Thank you, but no, the CephFS had plenty of space and we had several
with these issues. The clients do not respond to capability release
requests in time or at all. We see lots of slow requests in Ceph log,
sometimes MDS goes readonly. In both cases the whole cluster stalls.
As workarounds, we started avoiding renames and moved some temp file
areas to local disks. It helped, but was not enough.
Unfortunately, though blocking a client helps with slow requests, it
hangs that client and it needs a hard reboot. Mounting with
recover_session=clean did not help.
With kernel 5.10 on the clients everything works fine.
Amon.
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx