Hi Dan, thanks for this info, its a start. I guess you know that the output format is probably the most inconvenient for further processing. Is there really no low level tool to investigate the file system data structures in a more reasonable way? What are the devs using for debugging? Now to a first discovery: 00000190 00 00 2b 00 00 00 2f 68 70 63 2f 68 6f 6d 65 2f |..+.../hpc/home/| 000001a0 70 69 72 65 2f 61 6e 61 63 6f 6e 64 61 33 2f 69 |XXXXXX/anaconda3/i| 000001b0 6e 63 6c 75 64 65 2f 61 75 74 6f 74 65 73 74 2e |nclude/autotest.| 000001c0 68 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |h...............| The hex dump of the first omap value contains a file named "/hpc/home/XXXXX/anaconda3/include/autotest.h" (user name obscured) which does neither exist on the file system itself, nor in any of its snapshots. In fact, the folder "include" does not exist anywhere. How is this possible? There are loads of omap entries from this directory. > It's safe to increase mds_bal_fragment_size_max to 200000 if that ... I set it already to 150000. However, with the current growth rate of stray entries (currently 1021751) a value of 200000 will give me maybe 10 months. I would prefer a more sustainable solution. As a last question here, I run a "find /mnt/cephfs/hpc/home", which seemed to have some effect. However, I just completed an "ls -lR /mnt/cephfs/hpc/home/XXXXX/anaconda3" following the finding above, which gave a reduction of stray entries by about 5000. It seems like listing the directory contents is not enough to trigger a reintegration. What is the cheapest operation I need to execute on a file or directory to trigger reintegration? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Dan van der Ster <dan@xxxxxxxxxxxxxx> Sent: 18 January 2022 11:03:21 To: Frank Schilder Cc: Patrick Donnelly; ceph-users Subject: Re: Re: cephfs: [ERR] loaded dup inode Hi Frank, If you have one active MDS, the stray dir objects in the meta pool are named: 600.00000000 601.00000000 ... 609.00000000 So you can e.g. `rados listomapvals -p con-fs2-meta1 600.00000000` to get an idea about the stray files. Each of those stray dirs hold up to mds_bal_fragment_size_max entries. After they are full you'll get ENOSPACE on rm. It's safe to increase mds_bal_fragment_size_max to 200000 if that starts to happen. Cheers, Dan On Tue, Jan 18, 2022 at 10:53 AM Frank Schilder <frans@xxxxxx> wrote: > > Hi Dan and Patrick, > > this problem seems to develop into a nightmare. I executed a find on the file system and had some initial success. The number of stray files dropped by about 8%. Unfortunately, this is about it. I'm running a find now also on snap dirs, but I don't have much hope. There must be a way to find out what is accumulating in the stray buckets. As I wrote in another reply to this thread, I can't dump the trees: > > > I seem to have a problem. I cannot dump the mds tree: > > > > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mdsdir/stray0' > > root inode is not in cache > > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mds0/stray0' > > root inode is not in cache > > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mds0' 0 > > root inode is not in cache > > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mdsdir' 0 > > root inode is not in cache > > > > [root@ceph-08 ~]# ceph daemon mds.ceph-08 get subtrees | grep path > > "path": "", > > "path": "~mds0", > > > > However, this information is somewhere in rados objects and it should be possible to figure something out similar to > > # rados getxattr --pool=con-fs2-meta1 <OBJ_ID> parent | ceph-dencoder type inode_backtrace_t import - decode dump_json > # rados listomapkeys --pool=con-fs2-meta1 <OBJ_ID> > > What OBJ_IDs am I looking for? How and where can I start to traverse the structure? Version is mimic latest stable. > > Thanks for your help, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Dan van der Ster <dan@xxxxxxxxxxxxxx> > Sent: 17 January 2022 09:35:02 > To: Patrick Donnelly > Cc: Frank Schilder; ceph-users > Subject: Re: Re: cephfs: [ERR] loaded dup inode > > On Sun, Jan 16, 2022 at 3:54 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: > > > > Hi Dan, > > > > On Fri, Jan 14, 2022 at 6:32 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > We had this long ago related to a user generating lots of hard links. > > > Snapshots will have a similar effect. > > > (in these cases, if a user deletes the original file, the file goes > > > into stray until it is "reintegrated"). > > > > > > If you can find the dir where they're working, `ls -lR` will force > > > those to reintegrate (you will see because the num strays will drop > > > back down). > > > You might have to ls -lR in a snap directory, or in the current tree > > > -- you have to browse around and experiment. > > > > > > pacific does this re-integration automatically. > > > > This reintegration is still not automatic (i.e. the MDS does not have > > a mechanism (yet) for hunting for the dentry to do reintegration). > > The next version (planned) of Pacific will have reintegration > > triggered by recursive scrub: > > > > https://github.com/ceph/ceph/pull/44514 > > > > which is significantly less disruptive than `ls -lR` or `find`. > > Oops, sorry, my bad. > I was thinking about https://github.com/ceph/ceph/pull/33479 > > Cheers, Dan > > > > > > -- > > Patrick Donnelly, Ph.D. > > He / Him / His > > Principal Software Engineer > > Red Hat, Inc. > > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx