Re: CephFS: effects of using hard links

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 22 Mar 2019 14:55:58 +0530

On Thu, Mar 21, 2019 at 2:45 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> On Thu, Mar 21, 2019 at 8:51 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> >
> > On Wed, Mar 20, 2019 at 6:06 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >>
> >> On Tue, Mar 19, 2019 at 9:43 AM Erwin Bogaard <erwin.bogaard@xxxxxxxxx> wrote:
> >> >
> >> > Hi,
> >> >
> >> >
> >> >
> >> > For a number of application we use, there is a lot of file duplication. This wastes precious storage space, which I would like to avoid.
> >> >
> >> > When using a local disk, I can use a hard link to let all duplicate files point to the same inode (use “rdfind”, for example).
> >> >
> >> >
> >> >
> >> > As there isn’t any deduplication in Ceph(FS) I’m wondering if I can use hard links on CephFS in the same way as I use for ‘regular’ file systems like ext4 and xfs.
> >> >
> >> > 1. Is it advisible to use hard links on CephFS? (It isn’t in the ‘best practices’: http://docs.ceph.com/docs/master/cephfs/app-best-practices/)
> >> >
> >> > 2. Is there any performance (dis)advantage?
> >> >
> >> > 3. When using hard links, is there an actual space savings, or is there some trickery happening?
> >> >
> >> > 4. Are there any issues (other than the regular hard link ‘gotcha’s’) I need to keep in mind combining hard links with CephFS?
> >>
> >> The only issue we've seen is if you hardlink b to a, then rm a, then
> >> never stat b, the inode is added to the "stray" directory. By default
> >> there is a limit of 1 million stray entries -- so if you accumulate
> >> files in this state eventually users will be unable to rm any files,
> >> until you stat the `b` files.
> >
> >
> > Eek. Do you know if we have any tickets about that issue? It's easy to see how that happens but definitely isn't a good user experience!
>
> I'm not aware of a ticket -- I had thought it was just a fact of life
> with hardlinks and cephfs.

I think it is for now, but as you've demonstrated that's not really a
good situation and I'm sure we can figure out some way of
automatically merging inodes into their remaining link parents.
I've created a ticket at http://tracker.ceph.com/issues/38849

> After hitting this issue in prod, we found the explanation here in
> this old thread (with your useful post ;) ):
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/013621.html
>
> Our immediate workaround was to increase mds bal fragment size max
> (e.g. to 200000).
> In our env we now monitor num_strays in case these get out of control again.
>
> BTW, now thinking about this more... isn't directory fragmentation
> supposed to let the stray dir grow to unlimited shards? (on our side
> it seems limited to 10 shards). Maybe this is just some configuration
> issue on our side?

Sounds like I haven't missed a change here: the stray directory is a
special system directory that doesn't get fragmented like normal ones
do. We just set it up (hard-coded even, IIRC, but maybe a config
option) so that each MDS gets 10 of them after the first time somebody
managed to make it large enough that a single stray directory object
got too large. o_0
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com