Re: LARGE_OMAP_OBJECTS: any proper action possible?

Frank Schilder <frans@xxxxxx> · Tue, 31 Aug 2021 12:30:58 +0000

Dear Dan and Patrick,

the find didn't return anything. With this and the info below, am I right to assume that these were temporary working directories that got caught in a snapshot (we use rolling snapshots)?

I would really appreciate any ideas on how to find out the original file system path of these large directories. I would like to advise the user(s) that we have a special high-performance file system for temporary data.

I can't find indications of performance problems with the meta-data pool. After the re-deployment of OSDs with quadrupling the OSD count, the meta data pool seems to perform very well. The find did run over a 1.3PB file system in under 18hours.

However, running this find on the root got me caught in another problem: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/HKEBXXRMX5WA5Y6JFM34WFPMWTCMPFCG/#EMHNSHZIPFZZ5QYS6B4VW3LUGL6HDTOP

Apparently, the meta data performance is now so high that a single client can crash an MDS daemon and even take the MDS cluster with it.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder
Sent: 30 August 2021 16:18:02
To: ceph-users
Cc: Dan van der Ster; Patrick Donnelly
Subject: Re:  LARGE_OMAP_OBJECTS: any proper action possible?

Dear Dan and Patrick,

I have the suspicion that I'm looking at large directories in the snapshots that do no longer exist any more on the file system. Hence, the omap objects are not fragmented as explained in the tracker issue. Here is the info as you asked me to pull out:

> find /cephfs -type d -inum 1099738108263

The find didn't return yet. Would be great to find which user is doing that. Unfortunately, I don't believe the directory still exists.

> rados -p cephfs_metadata listomapkeys 1000d7fd167.02800000

I did this on a different object:

# rados listomapkeys --pool=con-fs2-meta1 1000eec35f5.01000000 | wc -l
216000

This matches with the log message. I guess these keys are file/dir names? Then yes, its a huge directory.

> Please try the resolutions suggested in: https://tracker.ceph.com/issues/45333

If I understand correctly, the INODE.00000000 objects contain the path information:

[root@gnosis ~]# rados listxattr --pool=con-fs2-meta1 1000eec35f5.01000000
[root@gnosis ~]# rados listxattr --pool=con-fs2-meta1 1000eec35f5.00000000
layout
parent

Decoding the meta info in the parent attribute gives:

[root@gnosis ~]# rados getxattr --pool=con-fs2-meta1 1000eec35f5.00000000 parent | ceph-dencoder type inode_backtrace_t import - decode dump_json
{
    "ino": 1099761989109,
    "ancestors": [
        {
            "dirino": 1552,
            "dname": "1000eec35f5",
            "version": 882614706
        },
        {
            "dirino": 257,
            "dname": "stray6",
            "version": 563853824
        }
    ],
    "pool": 12,
    "old_pools": []
}

This smells a lot like a deleted directory in a snapshot, moved to one of the stray object bucket. The result is essentially the same for all large omap objects except for the stray number. Is it possible to figure out the original location in the file system path?

I guess I have to increase the warning threshold or live with the warning message, neither of which is preferred. It would be great if you could help me find the original path so I can identify the user and advice him/her on how to organise his/her files.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
Sent: 27 August 2021 19:16:16
To: Frank Schilder
Cc: ceph-users
Subject: Re:  LARGE_OMAP_OBJECTS: any proper action possible?

Hi Frank,

On Wed, Aug 25, 2021 at 6:27 AM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi all,
>
> I have the notorious "LARGE_OMAP_OBJECTS: 4 large omap objects" warning and am again wondering if there is any proper action one can take except "wait it out and deep-scrub (numerous ceph-users threads)" or "ignore (https://docs.ceph.com/en/latest/rados/operations/health-checks/#large-omap-objects)". Only for RGWs is a proper action described, but mine come from MDSes. Is there any way to ask an MDS to clean up or split the objects?
>
> The disks with the meta-data pool can easily deal with objects of this size. My question is more along the lines: If I can't do anything anyway, why the warning? If there is a warning, I would assume that one can do something proper to prevent large omap objects from being born by an MDS. What is it?

Please try the resolutions suggested in: https://tracker.ceph.com/issues/45333

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx