Hi all,we noticed a massive drop in requests per second a cephfs client is able to perform when we do a recursive chown over a directory with millions of files. As soon as we see about 170k caps on the MDS, the client performance drops from about 660 reqs/sec to 70 reqs/sec.
When we then clear dentries and inodes using "sync; echo 2 > /proc/sys/vm/drop_caches" on the client, the request go up to ~660 again just to drop again when reaching about 170k caps.
See the attached screenshots.When we stop the chown process for a while and restart it ~25min later again it still performs very slowly and the MDS reqs/sec remain low (~60/sec.). Clearing the cache (dentries and inodes) on the client restores the performance again.
When we run the same chown on another client in parallel, it starts again with reasonable good performance (while the first client is poorly performing) but eventually it gets slow again just like the first client.
Can someone comment on this and explain it? How can this be solved, so that the performance remains stable?We are running ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus (stable) on all ceph cluster nodes and on all clients. The OS on all ceph cluster nodes and client nodes is CentOS 7.9. The filesystem is mounted via CentOS kernel client (latest official version).
Thanks in advance. ~Best Dietmar -- _________________________________________ D i e t m a r R i e d e r, Mag.Dr. Innsbruck Medical University Biocenter - Institute of Bioinformatics Innrain 80, 6020 Innsbruck Phone: +43 512 9003 71402 Fax: +43 512 9003 73100 Email: dietmar.rieder@xxxxxxxxxxx Web: http://www.icbi.at
Attachment:
Grafana-Ceph-Cluster_MDSreq.png
Description: PNG image
Attachment:
Grafana-Ceph-Cluster-MDScaps.png
Description: PNG image
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx