Re: Cephfs mds cache tuning

Adam Tygart <mozes@xxxxxxx> · Tue, 2 Oct 2018 03:35:28 +0000

Okay, here's what I've got: https://www.paste.ie/view/abe8c712

Of note, I've changed things up a little bit for the moment. I've
activated a second mds to see if it is a particular subtree that is
more prone to issues. maybe EC vs replica... The one that is currently
being slow has my EC volume pinned to it.

--
Adam
On Mon, Oct 1, 2018 at 10:02 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> Can you grab the perf dump during this time, perhaps plus dumps of the ops in progress?
>
> This is weird but given it’s somewhat periodic it might be something like the MDS needing to catch up on log trimming (though I’m unclear why changing the cache size would impact this).
>
> On Sun, Sep 30, 2018 at 9:02 PM Adam Tygart <mozes@xxxxxxx> wrote:
>>
>> Hello all,
>>
>> I've got a ceph (12.2.8) cluster with 27 servers, 500 osds, and 1000
>> cephfs mounts (kernel client). We're currently only using 1 active
>> mds.
>>
>> Performance is great about 80% of the time. MDS responses (per ceph
>> daemonperf mds.$(hostname -s), indicates 2k-9k requests per second,
>> with a latency under 100.
>>
>> It is the other 20ish percent I'm worried about. I'll check on it and
>> it with be going 5-15 seconds with "0" requests, "0" latency, then
>> give me 2 seconds of reasonable response times, and then back to
>> nothing. Clients are actually seeing blocked requests for this period
>> of time.
>>
>> The strange bit is that when I *reduce* the mds_cache_size, requests
>> and latencies go back to normal for a while. When it happens again,
>> I'll increase it back to where it was. It feels like the mds server
>> decides that some of these inodes can't be dropped from the cache
>> unless the cache size changes. Maybe something wrong with the LRU?
>>
>> I feel like I've got a reasonable cache size for my workload, 30GB on
>> the small end, 55GB on the large. No real reason for a swing this
>> large except to potentially delay it recurring after expansion for
>> longer.
>>
>> I also feel like there is probably some magic tunable to change how
>> inodes get stuck in the LRU. perhaps mds_cache_mid. Anyone know what
>> this tunable actually does? The documentation is a little sparse.
>>
>> I can grab logs from the mds if needed, just let me know the settings
>> you'd like to see.
>>
>> --
>> Adam
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com