Re: MDS stuck ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Venky,

thanks for taking the time. I'm afraid I still don't get the difference. Maybe the ceph dev terminology means something else than what I use. Let's look at this statement, I think it summarises my misery quite well:

> It's an implementation difference. In octopus, each child dir (direct
> descendent of the ephemeral pinned directory) is pinned to a target
> MDS based on the hash of its (child dir) inode number. From pacific
> onwards, the dirfrags are distributed across ranks. This limits the
> number of subtrees.

Let's say we have /home/{a..c} and I enable ephemeral pinning on /home. Let's also say that each of /home/{a..c} have a number of directory fragments, maybe somewhere deeper down in the hierarchy. As far as I understand it, ephemeral distributed pinning means that a static pin based on a hash function is assigned to each of /home/{a..c}, which, in turn, is then inherited by all of their child directories. Meaning that all directories under /home/a/ have the same effective static pin as /home/a and likewise for /home/b/... and /home/c/...

To me, this implies that any directory fragment that is a descendent of /home/a is also pinned to the same MDS as /home/a. I really don't understand what the difference between "each child dir (direct descendent of the ephemeral pinned directory) is pinned to a target MDS" (octopus) and "the dirfrags are distributed across ranks" (pacific) is. In other words, if /home/a is assigned a rank pin and all of its descendants inherit this rank pin, how can any directory fragment of (a descendant of) /home/a end up on an MDS that is different than the one assigned to /home/a?

What I observed is that /home/a/.../xyz and /home/a/..../uvw ended up on different ranks and none of the descriptions I have seen so far give an explanation for why this is expected. All explanations I have seen state that these should be on the same MDS in both, octopus and pacific.

It would be great if you could help me out here. Maybe it really is just terminology?

Thanks a lot for your time again!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Venky Shankar <vshankar@xxxxxxxxxx>
Sent: 29 November 2022 15:54:12
To: Frank Schilder
Cc: Reed Dier; ceph-users
Subject: Re:  Re: MDS stuck ops

Hi Frank,

On Tue, Nov 29, 2022 at 5:38 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Venky,
>
> maybe you can help me clarifying the situation a bit. I don't understand the difference between the two pinning implementations you describe in your reply and I also don't see any difference in meaning in the documentation between octopus and quicy, the difference is just in wording. Both texts state that "all of a directory’s immediate children should be ephemerally pinned" (octopus) and "This has the effect of distributing immediate children across a range of MDS ranks" (quincy).
>
> To me, both mean that, if I enable distributed ephemeral pinning on /home, then for every child /home/X of home it follows that /home/X and any directory under /home/X/ are pinned to the same MDS rank. Meaning their information in cache exists on this rank only and no other MDS is serving requests for any of these directories.
>
> Is there something wrong with this interpretation?

Distributed ephemeral pins will distribute immediate children across a
range of MDS ranks - /home/X might be on rank 1, /home/Y on rank 2,
/home/Z on rank 0, and so on.

>
> I tried it with octopus and the cache for directories under /home/X/ was all over the place. Nothing was pinned to a single rank and on top of that the number of sub-trees was extremely unevenly assigned and excessively large. After I set an explicit pin on every child /home/X of /home, only then was all cache information about all subdirs of /home/X/ handled by the MDS I pinned it to.

The directories (children) are spread across MDSs based on the
(consistent) hash of its inode number. The distribution should be
uniform across ranks.

>
> What should the result of distributed ephemeral pinning actually be when set on /home?
> What would be different between octopus and quincy?

It's an implementation difference. In octopus, each child dir (direct
descendent of the ephemeral pinned directory) is pinned to a target
MDS based on the hash of its (child dir) inode number. From pacific
onwards, the dirfrags are distributed across ranks. This limits the
number of subtrees.

> Is the documentation (for octopus) misleading or does the implementation not match documentation?

I think the docs are fine - quincy docs do mention that the directory
fragments are distributed while the octopus docs do not. I agree, the
wordings are a bit subtle.

>
> Thanks for any insight!
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Venky Shankar <vshankar@xxxxxxxxxx>
> Sent: 29 November 2022 10:09:21
> To: Frank Schilder
> Cc: Reed Dier; ceph-users
> Subject: Re:  Re: MDS stuck ops
>
> On Tue, Nov 29, 2022 at 1:42 PM Frank Schilder <frans@xxxxxx> wrote:
> >
> > Hi Venky.
> >
> > > You most likely ran into performance issues with distributed ephemeral
> > > pins with octopus. It'd be nice to try out one of the latest releases
> > > for this.
> >
> > I run into the problem that distributed ephemeral pinning seems not actually implemented in octopus. This mode didn't pin anything, see also the recent conversation with Patrick:
>
> Distributed ephemeral pins used to distribute inodes under a directory
> mongst MDSs which had scalability issues due to the sheer number of
> subtrees. This was changed to distribute dirfrags and I think those
> changes were not in octopus.
>
> >
> > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YEB34F5SREAOOMATOKC6NO3G2GVCSOOZ
> >
> > I sent him a couple of dumps, but am not sure if he is doing anything with it. I wrote a small script to do the distributed pinning by hand and it solved all sorts of problems.
>
> Distributing dirfrags solved a lot of scalability issues and those
> changes are available in pacific and beyond. We aren't backporting to
> octopus anymore, so the options are limited.
>
> >
> > Best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
>
>
> --
> Cheers,
> Venky
>


--
Cheers,
Venky

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux