Re: [Ceph-users] Re: MDS failing under load with large cache sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 5, 2019 at 10:31 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
>
> I had similar issues again today. Some users were trying to train a
> neural network on several million files resulting in enormous cache
> sizes. Due to my custom cap recall and decay rate settings, the MDSs
> were able to withstand the load for quite some time, but at some point
> the active rank crashed taking the whole CephFS down.
>
> As usual, the MDS were playing round-robin Russian roulette trying to
> recover the cache only to be killed by MONs after some time. I tried
> increasing the beacon grace time, but it didn't help, MONs were still
> kicking MDSs after what seemed like a random timeout.

You set mds_beacon_grace ?

> Even with the
> setting to wipe the MDS cache on startup, the CephFS was unable to
> recover. I had to manually delete the mds0_openfiles.* objects from the
> CephFS metadata pool of which I had a total of 9. Only then was I able
> to get the MDS back into a working state.

Yes, this optimization is having some struggles with large cache sizes
(ironically). Luckily, nuking the open file objects is harmless...

> I know there are some unreleased patches to improve the MDS behaviour as
> a result of this thread. Is there any timeline for when those will be
> available?

14.2.5: https://tracker.ceph.com/issues/41467

> This issue is rather critical. What I need is a faster cap
> recall (which got fixed I think, but hasn't been released so far) as
> well as probably some kind of hard limit after which a client has to
> release file handles.

MDS will soon be more aggressive about recalling caps from idle
sessions, which may help: https://tracker.ceph.com/issues/41865

That'll make 14.2.6 probably.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux