Re: Provide more documentation for MDS performance tuning on large file systems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks. I tried playing around a bit with mds_export_ephemeral_distributed just now, because it's pretty much the same thing that your script does manually. Unfortunately, it seems to have no effect.

I pinned all top-level directories to mds.0 and then enabled ceph.dir.pin.distributed for a few sub trees. Despite mds_export_ephemeral_distributed being set to true, all work is done by mds.0 now and I also don't see any additional pins in ceph tell mds.\* get subtrees.

Any ideas why that might be?


On 07/12/2020 10:49, Dan van der Ster wrote:
On Mon, Dec 7, 2020 at 10:39 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:

What exactly do you set to 64k?
We used to set mds_max_caps_per_client to 50000, but once we started
using the tuned caps recall config, we reverted that back to the
default 1M without issue.
mds_max_caps_per_client. As I mentioned, some clients hit this limit
regularly and they aren't entirely idle. I will keep tuning the recall
settings, though.

This 15k caps client I mentioned is not related to the max caps per
client config. In recent nautilus, the MDS will proactively recall
caps from idle clients -- so a client with even just a few caps like
this can provoke the caps recall warnings (if it is buggy, like in
this case). The client doesn't cause any real problems, just the
annoying warnings.
We only see the warnings during normal operation. I remember having
massive issues with early Nautilus releases, but thanks to more
aggressive recall behaviour in newer releases, that is fixed. Back then
it was virtually impossible to keep the MDS within the bounds of its
memory limit. Nowadays, the warnings only appear when the MDS is really
stressed. In that situation, the whole FS performance is already
degraded massively and MDSs are likely to fail and run into the rejoin loop.

Multi-active + pinning definitely increases the overall MD throughput
(once you can get the relevant inodes cached), because as you know the
MDS is single threaded and CPU bound at the limit.
We could get something like 4-5k handle_client_requests out of a
single MDS, and that really does scale horizontally as you add MDSs
(and pin).
Okay, I will definitely re-evaluate options for pinning individual
directories, perhaps a small script can do it.
There is a new ephemeral pinning option in the latest latest releases,
but we didn't try it yet.
Here's our script -- it assumes the parent dir is pinned to zero or
that bal is disabled:

https://github.com/cernceph/ceph-scripts/blob/master/tools/cephfs/cephfs-bal-shard

Too many pins can cause problems -- we have something like 700 pins at
the moment and it's fine, though.

Cheers, Dan



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux