Cephfs mds cache tuning

Adam Tygart <mozes@xxxxxxx> · Mon, 1 Oct 2018 04:01:59 +0000

Hello all,

I've got a ceph (12.2.8) cluster with 27 servers, 500 osds, and 1000
cephfs mounts (kernel client). We're currently only using 1 active
mds.

Performance is great about 80% of the time. MDS responses (per ceph
daemonperf mds.$(hostname -s), indicates 2k-9k requests per second,
with a latency under 100.

It is the other 20ish percent I'm worried about. I'll check on it and
it with be going 5-15 seconds with "0" requests, "0" latency, then
give me 2 seconds of reasonable response times, and then back to
nothing. Clients are actually seeing blocked requests for this period
of time.

The strange bit is that when I *reduce* the mds_cache_size, requests
and latencies go back to normal for a while. When it happens again,
I'll increase it back to where it was. It feels like the mds server
decides that some of these inodes can't be dropped from the cache
unless the cache size changes. Maybe something wrong with the LRU?

I feel like I've got a reasonable cache size for my workload, 30GB on
the small end, 55GB on the large. No real reason for a swing this
large except to potentially delay it recurring after expansion for
longer.

I also feel like there is probably some magic tunable to change how
inodes get stuck in the LRU. perhaps mds_cache_mid. Anyone know what
this tunable actually does? The documentation is a little sparse.

I can grab logs from the mds if needed, just let me know the settings
you'd like to see.

--
Adam
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com