Re: LARGE_OMAP_OBJECTS: any proper action possible?

Frank Schilder <frans@xxxxxx> · Tue, 1 Mar 2022 10:15:32 +0000

Hi Dan and Patrick and for the record to other users.

The large omap objects disappeared after deleting a static snapshot in a disjoint directory tree; see message https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/3VSDGMNJUSNQIAWM7IMMDZPTM3GWGCSZ/. Apparently, deleted hard links seem to block snap trimming of more than just the hard link data from snapshots. I reverted all warning settings back to their defaults.

I now take rotating snapshots only on the file system root and it seems to behave much better in terms of stray counts and fs client performance. All sorts of weird issues disappeared after changing to snapshots in the root only.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 31 August 2021 16:27:10
To: Dan van der Ster
Cc: Patrick Donnelly; ceph-users
Subject:  Re: LARGE_OMAP_OBJECTS: any proper action possible?

Hi Dan,

unfortunately, the file/directory names were generated like one would do for temporary files. No clue about their location. I would need to find such a file while it exists. Of course, I could execute a find on the snapshot ...

Just kidding. The large omap count is going down already, the first 4 are probably purged from the snapshots.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dan@xxxxxxxxxxxxxx>
Sent: 31 August 2021 15:44:41
To: Frank Schilder
Cc: Patrick Donnelly; ceph-users
Subject: Re:  LARGE_OMAP_OBJECTS: any proper action possible?

Hi,

I don't know how to find a full path from a dir object.
But perhaps you can make an educated guess based on what you see in:

rados listomapkeys --pool=con-fs2-meta1 1000eec35f5.01000000 | head -n 100

Those should be the directory entries. (s/_head//)

-- Dan

On Tue, Aug 31, 2021 at 2:31 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Dear Dan and Patrick,
>
> the find didn't return anything. With this and the info below, am I right to assume that these were temporary working directories that got caught in a snapshot (we use rolling snapshots)?
>
> I would really appreciate any ideas on how to find out the original file system path of these large directories. I would like to advise the user(s) that we have a special high-performance file system for temporary data.
>
> I can't find indications of performance problems with the meta-data pool. After the re-deployment of OSDs with quadrupling the OSD count, the meta data pool seems to perform very well. The find did run over a 1.3PB file system in under 18hours.
>
> However, running this find on the root got me caught in another problem: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/HKEBXXRMX5WA5Y6JFM34WFPMWTCMPFCG/#EMHNSHZIPFZZ5QYS6B4VW3LUGL6HDTOP
>
> Apparently, the meta data performance is now so high that a single client can crash an MDS daemon and even take the MDS cluster with it.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder
> Sent: 30 August 2021 16:18:02
> To: ceph-users
> Cc: Dan van der Ster; Patrick Donnelly
> Subject: Re:  LARGE_OMAP_OBJECTS: any proper action possible?
>
> Dear Dan and Patrick,
>
> I have the suspicion that I'm looking at large directories in the snapshots that do no longer exist any more on the file system. Hence, the omap objects are not fragmented as explained in the tracker issue. Here is the info as you asked me to pull out:
>
> > find /cephfs -type d -inum 1099738108263
>
> The find didn't return yet. Would be great to find which user is doing that. Unfortunately, I don't believe the directory still exists.
>
> > rados -p cephfs_metadata listomapkeys 1000d7fd167.02800000
>
> I did this on a different object:
>
> # rados listomapkeys --pool=con-fs2-meta1 1000eec35f5.01000000 | wc -l
> 216000
>
> This matches with the log message. I guess these keys are file/dir names? Then yes, its a huge directory.
>
> > Please try the resolutions suggested in: https://tracker.ceph.com/issues/45333
>
> If I understand correctly, the INODE.00000000 objects contain the path information:
>
> [root@gnosis ~]# rados listxattr --pool=con-fs2-meta1 1000eec35f5.01000000
> [root@gnosis ~]# rados listxattr --pool=con-fs2-meta1 1000eec35f5.00000000
> layout
> parent
>
> Decoding the meta info in the parent attribute gives:
>
> [root@gnosis ~]# rados getxattr --pool=con-fs2-meta1 1000eec35f5.00000000 parent | ceph-dencoder type inode_backtrace_t import - decode dump_json
> {
>     "ino": 1099761989109,
>     "ancestors": [
>         {
>             "dirino": 1552,
>             "dname": "1000eec35f5",
>             "version": 882614706
>         },
>         {
>             "dirino": 257,
>             "dname": "stray6",
>             "version": 563853824
>         }
>     ],
>     "pool": 12,
>     "old_pools": []
> }
>
> This smells a lot like a deleted directory in a snapshot, moved to one of the stray object bucket. The result is essentially the same for all large omap objects except for the stray number. Is it possible to figure out the original location in the file system path?
>
> I guess I have to increase the warning threshold or live with the warning message, neither of which is preferred. It would be great if you could help me find the original path so I can identify the user and advice him/her on how to organise his/her files.
>
> Thanks and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
> Sent: 27 August 2021 19:16:16
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re:  LARGE_OMAP_OBJECTS: any proper action possible?
>
> Hi Frank,
>
> On Wed, Aug 25, 2021 at 6:27 AM Frank Schilder <frans@xxxxxx> wrote:
> >
> > Hi all,
> >
> > I have the notorious "LARGE_OMAP_OBJECTS: 4 large omap objects" warning and am again wondering if there is any proper action one can take except "wait it out and deep-scrub (numerous ceph-users threads)" or "ignore (https://docs.ceph.com/en/latest/rados/operations/health-checks/#large-omap-objects)". Only for RGWs is a proper action described, but mine come from MDSes. Is there any way to ask an MDS to clean up or split the objects?
> >
> > The disks with the meta-data pool can easily deal with objects of this size. My question is more along the lines: If I can't do anything anyway, why the warning? If there is a warning, I would assume that one can do something proper to prevent large omap objects from being born by an MDS. What is it?
>
> Please try the resolutions suggested in: https://tracker.ceph.com/issues/45333
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat Sunnyvale, CA
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx