Re: cephfs: [ERR] loaded dup inode

Frank Schilder <frans@xxxxxx> · Tue, 18 Jan 2022 17:11:45 +0000

A quick update. The stray entries are blowing up. Since the last e-mail 4 hours ago, the stray count increased from 1021751 to 1036754. This is 15K more within a few hours and still raising fast. I'm running an "ls -laR" on a directory with anaconda installed and start having the impression that my attempts are counter productive.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 18 January 2022 13:58:29
To: Dan van der Ster
Cc: Patrick Donnelly; ceph-users
Subject:  Re: cephfs: [ERR] loaded dup inode

Hi Dan,

thanks for this info, its a start. I guess you know that the output format is probably the most inconvenient for further processing. Is there really no low level tool to investigate the file system data structures in a more reasonable way? What are the devs using for debugging?

Now to a first discovery:

00000190  00 00 2b 00 00 00 2f 68  70 63 2f 68 6f 6d 65 2f  |..+.../hpc/home/|
000001a0  70 69 72 65 2f 61 6e 61  63 6f 6e 64 61 33 2f 69  |XXXXXX/anaconda3/i|
000001b0  6e 63 6c 75 64 65 2f 61  75 74 6f 74 65 73 74 2e  |nclude/autotest.|
000001c0  68 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |h...............|

The hex dump of the first omap value contains a file named "/hpc/home/XXXXX/anaconda3/include/autotest.h" (user name obscured) which does neither exist on the file system itself, nor in any of its snapshots. In fact, the folder "include" does not exist anywhere. How is this possible? There are loads of omap entries from this directory.

> It's safe to increase mds_bal_fragment_size_max to 200000 if that ...

I set it already to 150000. However, with the current growth rate of stray entries (currently 1021751) a value of 200000 will give me maybe 10 months. I would prefer a more sustainable solution.

As a last question here, I run a "find /mnt/cephfs/hpc/home", which seemed to have some effect. However, I just completed an "ls -lR /mnt/cephfs/hpc/home/XXXXX/anaconda3" following the finding above, which gave a reduction of stray entries by about 5000. It seems like listing the directory contents is not enough to trigger a reintegration. What is the cheapest operation I need to execute on a file or directory to trigger reintegration?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dan@xxxxxxxxxxxxxx>
Sent: 18 January 2022 11:03:21
To: Frank Schilder
Cc: Patrick Donnelly; ceph-users
Subject: Re:  Re: cephfs: [ERR] loaded dup inode

Hi Frank,

If you have one active MDS, the stray dir objects in the meta pool are named:

600.00000000
601.00000000
...
609.00000000

So you can e.g. `rados listomapvals -p con-fs2-meta1 600.00000000` to
get an idea about the stray files.

Each of those stray dirs hold up to mds_bal_fragment_size_max entries.
After they are full you'll get ENOSPACE on rm.
It's safe to increase mds_bal_fragment_size_max to 200000 if that
starts to happen.

Cheers, Dan

On Tue, Jan 18, 2022 at 10:53 AM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Dan and Patrick,
>
> this problem seems to develop into a nightmare. I executed a find on the file system and had some initial success. The number of stray files dropped by about 8%. Unfortunately, this is about it. I'm running a find now also on snap dirs, but I don't have much hope. There must be a way to find out what is accumulating in the stray buckets. As I wrote in another reply to this thread, I can't dump the trees:
>
> > I seem to have a problem. I cannot dump the mds tree:
> >
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mdsdir/stray0'
> > root inode is not in cache
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mds0/stray0'
> > root inode is not in cache
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mds0' 0
> > root inode is not in cache
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mdsdir' 0
> > root inode is not in cache
> >
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 get subtrees | grep path
> >             "path": "",
> >             "path": "~mds0",
> >
>
> However, this information is somewhere in rados objects and it should be possible to figure something out similar to
>
> # rados getxattr --pool=con-fs2-meta1 <OBJ_ID> parent | ceph-dencoder type inode_backtrace_t import - decode dump_json
> # rados listomapkeys --pool=con-fs2-meta1 <OBJ_ID>
>
> What OBJ_IDs am I looking for? How and where can I start to traverse the structure? Version is mimic latest stable.
>
> Thanks for your help,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Dan van der Ster <dan@xxxxxxxxxxxxxx>
> Sent: 17 January 2022 09:35:02
> To: Patrick Donnelly
> Cc: Frank Schilder; ceph-users
> Subject: Re:  Re: cephfs: [ERR] loaded dup inode
>
> On Sun, Jan 16, 2022 at 3:54 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
> >
> > Hi Dan,
> >
> > On Fri, Jan 14, 2022 at 6:32 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > > We had this long ago related to a user generating lots of hard links.
> > > Snapshots will have a similar effect.
> > > (in these cases, if a user deletes the original file, the file goes
> > > into stray until it is "reintegrated").
> > >
> > > If you can find the dir where they're working, `ls -lR` will force
> > > those to reintegrate (you will see because the num strays will drop
> > > back down).
> > > You might have to ls -lR in a snap directory, or in the current tree
> > > -- you have to browse around and experiment.
> > >
> > > pacific does this re-integration automatically.
> >
> > This reintegration is still not automatic (i.e. the MDS does not have
> > a mechanism (yet) for hunting for the dentry to do reintegration).
> > The next version (planned) of Pacific will have reintegration
> > triggered by recursive scrub:
> >
> > https://github.com/ceph/ceph/pull/44514
> >
> > which is significantly less disruptive than `ls -lR` or `find`.
>
> Oops, sorry, my bad.
> I was thinking about https://github.com/ceph/ceph/pull/33479
>
> Cheers, Dan
>
>
> >
> > --
> > Patrick Donnelly, Ph.D.
> > He / Him / His
> > Principal Software Engineer
> > Red Hat, Inc.
> > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
> >
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx