Re: Too slow CephFS MDS Restart (Recovery) Performance with Many Sessions and Large Cache Size

Zizon Qiu <zzdtsv@xxxxxxxxx> · Thu, 1 Apr 2021 22:31:27 +0800

On Thu, Apr 1, 2021 at 8:39 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
On Wed, Mar 31, 2021 at 6:46 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:

>

> Hello Yongseok,

>

> On Wed, Mar 31, 2021 at 1:13 AM Yongseok Oh <yongseok.oh@xxxxxxxxxxxx> wrote:

> >

...

> > A few things I have analyzed

> > - Rejoining process consumes a considerable amount of time. That's a known issue. (Sometimes respawning MDS happened. Increasing mds_heartbeat_grace doesn't help.)

>

> Please turn up logging to:

>

> debug_mds = 5

>

> to get an idea what the MDS is doing when respawn occurs.

If it helps, here is a log with 2/5 from a recent failover which took

3.5 minutes: https://termbin.com/b022

This is 14.2.11 with the optimized recall/cache tuning.

the "Updating MDS map to version" keep showing from init-rejoin at around
2021-03-18 17:12:52.863 
until
2021-03-18 17:22:36.356 
while the rejoin itself finished at 
2021-03-18 17:15:58.325

So,if it do associated to paxos(with too much changes between then), maybe pinning some 
subtree/directory to particular mds/rank would help too.

Indeed rejoin is always the longest step -- even with cephfs_metadata

on SSDs. These MDSs had the cache limit set to 8GB, and you can see

that the rejoining MDS needed 56GB while booting.

I haven't had a chance to test the rejoin/openfiletables optimizations

yet. (https://github.com/ceph/ceph/pull/37383)

But I had understood that this is intended to decrease that rejoin

memory usage -- will it also speed things up?

-- dan

_______________________________________________

Dev mailing list -- dev@xxxxxxx

To unsubscribe send an email to dev-leave@xxxxxxx

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx