Large OMAP Objects & Pubsub

"Alex Hussein-Kershaw (HE/HIM)" <alexhus@xxxxxxxxxxxxx> · Tue, 25 Oct 2022 13:48:32 +0000

Hi All,

Looking to get some advice on an issue my clusters have been suffering from. Realize there are lots of text below. Thanks in advance for your consideration.

The cluster has a health warning of "32 large omap objects". It's persisted for several months.

It appears functional and there are no indications of a performance problem at the client for now (no slow ops - everything seems to work fine). It is a multisite cluster with CephFS & S3 in use, as well as pubsub. It is running Ceph version 15.2.13.

We run automated client load tests against this system every day and have been doing that for a year or longer against this system. The key counts of the large OMAP objects in question are growing, I've monitored this over a period of several months. Intuitively I gather this means at some point in the future I will hit performance problems as a result of this.

Large OMAP objects are split across two pools: siteApubsub.rgw.log and siteApubsub.rgw.buckets.index. My client is responsible for processing the pubsub queue. It appears to be doing that successfully: there are no objects in the pubsub data pool as shown in the details below.

I've been keeping a spreadsheet to track the growth of these, assuming I can't attach a file to the mailing list so I've uploaded an image of it here: https://imgur.com/a/gAtAcvp. The data shows constant growth of all of these objects through the last couple of months. It also includes the names of the objects, where there are two categories:

  *   16 instances of objects with names like: 9:03d18f4d:::data_log.47:head
  *   16 instances of objects with names like: 13:0118e6b8:::.dir.4f442377-4b71-4c6a-aaa9-ba945d7694f8.84778.1.15:head

Please find output of a few Ceph commands below giving details of the cluster.

  *   I'm really keen to understand this better and would be more than happy to share additional diags.
  *   I'd like to understand what I need to do to remove these large OMAP objects and prevent future build ups, so I don't need to worry about the stability of this system.

Thanks,
Alex

$ ceph -s
    id:     0b91b8be-3e01-4240-bea5-df01c7e53b7c
    health: HEALTH_WARN
            32 large omap objects

  services:
    mon: 3 daemons, quorum albans_sc0,albans_sc1,albans_sc2 (age 6w)
    mgr: albans_sc2(active, since 6w), standbys: albans_sc1, albans_sc0
    mds: cephfs:1 {0=albans_sc2=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 6w), 3 in (since 10M)
    rgw: 6 daemons active (albans_sc0.pubsub, albans_sc0.rgw0, albans_sc1.pubsub, albans_sc1.rgw0, albans_sc2.pubsub, albans_sc2.rgw0)

  task status:

  data:
    pools:   14 pools, 137 pgs
    objects: 4.52M objects, 160 GiB
    usage:   536 GiB used, 514 GiB / 1.0 TiB avail
    pgs:     137 active+clean

  io:
    client:   28 MiB/s rd, 1.2 MiB/s wr, 673 op/s rd, 189 op/s wr

$ ceph health detail
HEALTH_WARN 32 large omap objects
[WRN] LARGE_OMAP_OBJECTS: 32 large omap objects
    16 large objects found in pool 'siteApubsub.rgw.log'
    16 large objects found in pool 'siteApubsub.rgw.buckets.index'
    Search the cluster log for 'Large omap object found' for more details.

$ ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
ssd    1.0 TiB  514 GiB  496 GiB   536 GiB      51.07
TOTAL  1.0 TiB  514 GiB  496 GiB   536 GiB      51.07

--- POOLS ---
POOL                           ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics           1    1      0 B        0      0 B      0    153 GiB
cephfs_data                     2   32  135 GiB    1.99M  415 GiB  47.50    153 GiB
cephfs_metadata                 3   32  3.3 GiB    2.09M  9.8 GiB   2.09    153 GiB
siteA.rgw.buckets.data          4   32   24 GiB  438.62k   80 GiB  14.88    153 GiB
.rgw.root                       5    4   19 KiB       29  1.3 MiB      0    153 GiB
siteA.rgw.log                   6    4   79 MiB      799  247 MiB   0.05    153 GiB
siteA.rgw.control               7    4      0 B        8      0 B      0    153 GiB
siteA.rgw.meta                  8    4   13 KiB       37  1.6 MiB      0    153 GiB
siteApubsub.rgw.log             9    4  1.9 GiB      789  5.7 GiB   1.22    153 GiB
siteA.rgw.buckets.index        10    4  456 MiB       31  1.3 GiB   0.29    153 GiB
siteApubsub.rgw.control        11    4      0 B        8      0 B      0    153 GiB
siteApubsub.rgw.meta           12    4   11 KiB       40  1.7 MiB      0    153 GiB
siteApubsub.rgw.buckets.index  13    4  2.0 GiB       47  6.1 GiB   1.31    153 GiB
siteApubsub.rgw.buckets.data   14    4      0 B        0      0 B      0    153 GiB

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx