Hi Venky, thanks for taking the time. I'm afraid I still don't get the difference. Maybe the ceph dev terminology means something else than what I use. Let's look at this statement, I think it summarises my misery quite well: > It's an implementation difference. In octopus, each child dir (direct > descendent of the ephemeral pinned directory) is pinned to a target > MDS based on the hash of its (child dir) inode number. From pacific > onwards, the dirfrags are distributed across ranks. This limits the > number of subtrees. Let's say we have /home/{a..c} and I enable ephemeral pinning on /home. Let's also say that each of /home/{a..c} have a number of directory fragments, maybe somewhere deeper down in the hierarchy. As far as I understand it, ephemeral distributed pinning means that a static pin based on a hash function is assigned to each of /home/{a..c}, which, in turn, is then inherited by all of their child directories. Meaning that all directories under /home/a/ have the same effective static pin as /home/a and likewise for /home/b/... and /home/c/... To me, this implies that any directory fragment that is a descendent of /home/a is also pinned to the same MDS as /home/a. I really don't understand what the difference between "each child dir (direct descendent of the ephemeral pinned directory) is pinned to a target MDS" (octopus) and "the dirfrags are distributed across ranks" (pacific) is. In other words, if /home/a is assigned a rank pin and all of its descendants inherit this rank pin, how can any directory fragment of (a descendant of) /home/a end up on an MDS that is different than the one assigned to /home/a? What I observed is that /home/a/.../xyz and /home/a/..../uvw ended up on different ranks and none of the descriptions I have seen so far give an explanation for why this is expected. All explanations I have seen state that these should be on the same MDS in both, octopus and pacific. It would be great if you could help me out here. Maybe it really is just terminology? Thanks a lot for your time again! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Venky Shankar <vshankar@xxxxxxxxxx> Sent: 29 November 2022 15:54:12 To: Frank Schilder Cc: Reed Dier; ceph-users Subject: Re: Re: MDS stuck ops Hi Frank, On Tue, Nov 29, 2022 at 5:38 PM Frank Schilder <frans@xxxxxx> wrote: > > Hi Venky, > > maybe you can help me clarifying the situation a bit. I don't understand the difference between the two pinning implementations you describe in your reply and I also don't see any difference in meaning in the documentation between octopus and quicy, the difference is just in wording. Both texts state that "all of a directory’s immediate children should be ephemerally pinned" (octopus) and "This has the effect of distributing immediate children across a range of MDS ranks" (quincy). > > To me, both mean that, if I enable distributed ephemeral pinning on /home, then for every child /home/X of home it follows that /home/X and any directory under /home/X/ are pinned to the same MDS rank. Meaning their information in cache exists on this rank only and no other MDS is serving requests for any of these directories. > > Is there something wrong with this interpretation? Distributed ephemeral pins will distribute immediate children across a range of MDS ranks - /home/X might be on rank 1, /home/Y on rank 2, /home/Z on rank 0, and so on. > > I tried it with octopus and the cache for directories under /home/X/ was all over the place. Nothing was pinned to a single rank and on top of that the number of sub-trees was extremely unevenly assigned and excessively large. After I set an explicit pin on every child /home/X of /home, only then was all cache information about all subdirs of /home/X/ handled by the MDS I pinned it to. The directories (children) are spread across MDSs based on the (consistent) hash of its inode number. The distribution should be uniform across ranks. > > What should the result of distributed ephemeral pinning actually be when set on /home? > What would be different between octopus and quincy? It's an implementation difference. In octopus, each child dir (direct descendent of the ephemeral pinned directory) is pinned to a target MDS based on the hash of its (child dir) inode number. From pacific onwards, the dirfrags are distributed across ranks. This limits the number of subtrees. > Is the documentation (for octopus) misleading or does the implementation not match documentation? I think the docs are fine - quincy docs do mention that the directory fragments are distributed while the octopus docs do not. I agree, the wordings are a bit subtle. > > Thanks for any insight! > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Venky Shankar <vshankar@xxxxxxxxxx> > Sent: 29 November 2022 10:09:21 > To: Frank Schilder > Cc: Reed Dier; ceph-users > Subject: Re: Re: MDS stuck ops > > On Tue, Nov 29, 2022 at 1:42 PM Frank Schilder <frans@xxxxxx> wrote: > > > > Hi Venky. > > > > > You most likely ran into performance issues with distributed ephemeral > > > pins with octopus. It'd be nice to try out one of the latest releases > > > for this. > > > > I run into the problem that distributed ephemeral pinning seems not actually implemented in octopus. This mode didn't pin anything, see also the recent conversation with Patrick: > > Distributed ephemeral pins used to distribute inodes under a directory > mongst MDSs which had scalability issues due to the sheer number of > subtrees. This was changed to distribute dirfrags and I think those > changes were not in octopus. > > > > > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YEB34F5SREAOOMATOKC6NO3G2GVCSOOZ > > > > I sent him a couple of dumps, but am not sure if he is doing anything with it. I wrote a small script to do the distributed pinning by hand and it solved all sorts of problems. > > Distributing dirfrags solved a lot of scalability issues and those > changes are available in pacific and beyond. We aren't backporting to > octopus anymore, so the options are limited. > > > > > Best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > > -- > Cheers, > Venky > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx