Hi,
On 1/18/21 5:46 PM, Dietmar Rieder wrote:
Hi all,
we noticed a massive drop in requests per second a cephfs client is
able to perform when we do a recursive chown over a directory with
millions of files. As soon as we see about 170k caps on the MDS, the
client performance drops from about 660 reqs/sec to 70 reqs/sec.
When we then clear dentries and inodes using "sync; echo 2 >
/proc/sys/vm/drop_caches" on the client, the request go up to ~660
again just to drop again when reaching about 170k caps.
See the attached screenshots.
When we stop the chown process for a while and restart it ~25min later
again it still performs very slowly and the MDS reqs/sec remain low
(~60/sec.). Clearing the cache (dentries and inodes) on the client
restores the performance again.
When we run the same chown on another client in parallel, it starts
again with reasonable good performance (while the first client is
poorly performing) but eventually it gets slow again just like the
first client.
Can someone comment on this and explain it?
How can this be solved, so that the performance remains stable?
The MDS has a (soft) limit for number of caps per client. If a clients
starts to requests more caps, the MDS will ask it to release caps. This
will add an extra network round trip, thus increasing processing time.
The setting is 'mds_max_caps_per_client'. The default value is 1 million
caps per client, but maybe this setting was changed in our configuration
or the overall cap limit for the MDS is restricting it.
Since each assigned cap increases the memory consumption of the MDS,
setting an upper limit helps to control the overall amount of memory the
MDS is using. So the memory target also affects the number of active
caps an MDS can manage. You need to adjust both values to your use case.
I would also recommend to monitor the cap usafe of the MDS, e.g. by
running 'ceph daemonperf mds.<mds name>' in a shell on the mds server.
Other methods using the various monitoring interfaces provided by ceph
are also possible.
There are also settings that control how fast a client is releasing caps
for files; maybe tweaking these settings on the client side may also
help in your case.
Regards,
Burkhard
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx