Re: [Ceph-users] Re: MDS failing under load with large cache sizes

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Wed, 24 Jul 2019 14:50:38 -0700

+ other ceph-users

On Wed, Jul 24, 2019 at 10:26 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
>
> > what's the ceph.com mailing list? I wondered whether this list is dead but it's the list announced on the official ceph.com homepage, isn't it?
> There are two mailing lists announced on the website. If you go to
> https://ceph.com/resources/ you will find the
> subscribe/unsubscribe/archive links for the (much more active) ceph.com
> MLs. But if you click on "Mailing Lists & IRC page" you will get to a
> page where you can subscribe to this list, which is different. Very
> confusing.

It is confusing. This is supposed to be the new ML but I don't think
the migration has started yet.

> > What did you have the MDS cache size set to at the time?
> >
> > < and an inode count between
>
> I actually did not think I'd get a reply here. We are a bit further than
> this on the other mailing list. This is the thread:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036095.html
>
> To sum it up: the ceph client prevents the MDS from freeing its cache,
> so inodes keep piling up until either the MDS becomes too slow (fixable
> by increasing the beacon grace time) or runs out of memory. The latter
> will happen eventually. In the end, my MDSs couldn't even rejoin because
> they hit the host's 128GB memory limit and crashed.

It's possible the MDS is not being aggressive enough with asking the
single (?) client to reduce its cache size. There were recent changes
[1] to the MDS to improve this. However, the defaults may not be
aggressive enough for your client's workload. Can you try:

ceph config set mds mds_recall_max_caps 10000
ceph config set mds mds_recall_max_decay_rate 1.0

Also your other mailings made me think you may still be using the old
inode limit for the cache size. Are you using the new
mds_cache_memory_limit config option?

Finally, if this fixes your issue (please let us know!) and you decide
to try multiple active MDS, you should definitely use pinning as the
parallel create workload will greatly benefit from it.

[1] https://ceph.com/community/nautilus-cephfs/

--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com