Re: MDS stuck ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Frank,

On Tue, Nov 29, 2022 at 5:38 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Venky,
>
> maybe you can help me clarifying the situation a bit. I don't understand the difference between the two pinning implementations you describe in your reply and I also don't see any difference in meaning in the documentation between octopus and quicy, the difference is just in wording. Both texts state that "all of a directory’s immediate children should be ephemerally pinned" (octopus) and "This has the effect of distributing immediate children across a range of MDS ranks" (quincy).
>
> To me, both mean that, if I enable distributed ephemeral pinning on /home, then for every child /home/X of home it follows that /home/X and any directory under /home/X/ are pinned to the same MDS rank. Meaning their information in cache exists on this rank only and no other MDS is serving requests for any of these directories.
>
> Is there something wrong with this interpretation?

Distributed ephemeral pins will distribute immediate children across a
range of MDS ranks - /home/X might be on rank 1, /home/Y on rank 2,
/home/Z on rank 0, and so on.

>
> I tried it with octopus and the cache for directories under /home/X/ was all over the place. Nothing was pinned to a single rank and on top of that the number of sub-trees was extremely unevenly assigned and excessively large. After I set an explicit pin on every child /home/X of /home, only then was all cache information about all subdirs of /home/X/ handled by the MDS I pinned it to.

The directories (children) are spread across MDSs based on the
(consistent) hash of its inode number. The distribution should be
uniform across ranks.

>
> What should the result of distributed ephemeral pinning actually be when set on /home?
> What would be different between octopus and quincy?

It's an implementation difference. In octopus, each child dir (direct
descendent of the ephemeral pinned directory) is pinned to a target
MDS based on the hash of its (child dir) inode number. From pacific
onwards, the dirfrags are distributed across ranks. This limits the
number of subtrees.

> Is the documentation (for octopus) misleading or does the implementation not match documentation?

I think the docs are fine - quincy docs do mention that the directory
fragments are distributed while the octopus docs do not. I agree, the
wordings are a bit subtle.

>
> Thanks for any insight!
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Venky Shankar <vshankar@xxxxxxxxxx>
> Sent: 29 November 2022 10:09:21
> To: Frank Schilder
> Cc: Reed Dier; ceph-users
> Subject: Re:  Re: MDS stuck ops
>
> On Tue, Nov 29, 2022 at 1:42 PM Frank Schilder <frans@xxxxxx> wrote:
> >
> > Hi Venky.
> >
> > > You most likely ran into performance issues with distributed ephemeral
> > > pins with octopus. It'd be nice to try out one of the latest releases
> > > for this.
> >
> > I run into the problem that distributed ephemeral pinning seems not actually implemented in octopus. This mode didn't pin anything, see also the recent conversation with Patrick:
>
> Distributed ephemeral pins used to distribute inodes under a directory
> mongst MDSs which had scalability issues due to the sheer number of
> subtrees. This was changed to distribute dirfrags and I think those
> changes were not in octopus.
>
> >
> > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YEB34F5SREAOOMATOKC6NO3G2GVCSOOZ
> >
> > I sent him a couple of dumps, but am not sure if he is doing anything with it. I wrote a small script to do the distributed pinning by hand and it solved all sorts of problems.
>
> Distributing dirfrags solved a lot of scalability issues and those
> changes are available in pacific and beyond. We aren't backporting to
> octopus anymore, so the options are limited.
>
> >
> > Best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
>
>
> --
> Cheers,
> Venky
>


-- 
Cheers,
Venky

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux