Re: [EXTERNAL] Re: Fw: Large OMAP Objects & Pubsub

"Alex Hussein-Kershaw (HE/HIM)" <alexhus@xxxxxxxxxxxxx> · Fri, 28 Oct 2022 16:19:04 +0000

Hi Yuval,

Thanks for the info. So, this is a side effect of pub sub sitting on-top of the RGW sync mechanism? I've re-included ceph-users mailing list on this email in case anyone has ideas how to alleviate this.

Some good news on my part is that I've managed to clear 16 of the large OMAP objects with the instructions here [1]. That is, bilog trimming and running a deep scrub on the affected PGs.

That leaves the large OMAP objects in the "siteApubsub.rgw.log" pool that I am still hoping to find a way to clear. These are the objects of the form "9:03d18f4d:::data_log.47:head". From [2] I gather that these are used for multisite syncing. Our pubsub zones are not syncing between multisite. I wonder if that makes this simply a misconfiguration and the fix is just a correction to config.

I've been doing some digging today and found that our pubsub zone has the following config:

        {
            "id": "4f442377-4b71-4c6a-aaa9-ba945d7694f8",
            "name": "siteApubsub",
            "endpoints": [
                https://10.225.41.200:7481,
                https://10.225.41.201:7481,
                https://10.225.41.202:7481
            ],
            "log_meta": "false",
            "log_data": "true",
            "bucket_index_max_shards": 11,
            "read_only": "false",
            "tier_type": "pubsub",
            "sync_from_all": "false",
            "sync_from": [
                "siteA"
            ],
            "redirect_zone": ""
        }

And sync status shows...

source: 4f442377-4b71-4c6a-aaa9-ba945d7694f8 (siteApubsub)
              not syncing from zone

If I set the "log_data" field to false, I think this simply stops writing these files which are not required anyway. And presumably have been building up gradually forever as the normal trimming is not occurring as there is no multisite sync.

So my question to any who may be able to answer:

  *   Is the above analysis sound?
  *   Can I update the zone config and delete these data_log objects manual to restore my cluster to HEALTH_OK?

Thanks,
Alex

[1] https://access.redhat.com/solutions/6450561
[2] https://www.spinics.net/lists/ceph-users/msg54282.html

From: Yuval Lifshitz <ylifshit@xxxxxxxxxx>
Sent: Thursday, October 27, 2022 5:35 PM
To: Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx>
Subject: Re: [EXTERNAL] Re: Fw: Large OMAP Objects & Pubsub

Hi Alex,
I checked with the RGW people working on multisite, they say they observed that in high-load tests (unrelated to pubsub).
This means that even if this is fixed, the fix is not going to be backported to octopus.
If they have some kind of workaround, I will let you know.

Yuval

On Thu, Oct 27, 2022 at 5:50 PM Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx<mailto:alexhus@xxxxxxxxxxxxx>> wrote:
Hi Yuval,

Thanks for your reply and consideration. It's much appreciated. We don't use kafka (nor do I know what it is - I had a quick google) but I think the concern is the same - if our client goes down and misses notifications from Ceph we need Ceph to resend it until it is acknowledged. Sounds like the bucket notification and persistent notifications fits this requirement perfectly. I'll flag with my Team that this is available in Pacific, and that we should take it when we move.

That said, we're still on Octopus for our main release so while that gives us a direction for future, I'd still like to find a solution to the initial problem as we have slow-moving customers who might stick with Octopus for several years even after we offer a Pacific (and bucket notification) based solution.

Interestingly we've not seen this at any customers systems, only on our heavily loaded test system. I suspect the high and regular load this system receives must be the cause of this. I've contemplated fully stopping the load for a month or so and observing the effect. I wonder if we're out-pacing some clean-up mechanism (I think we've seen similar things elsewhere in our Ceph usage).

However, we're fairly limited on virtualisation rig space and don't want to sit this system idle if we can avoid it.

Best wishes,
Alex

From: Yuval Lifshitz <ylifshit@xxxxxxxxxx<mailto:ylifshit@xxxxxxxxxx>>
Sent: Thursday, October 27, 2022 10:05 AM
To: Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx<mailto:alexhus@xxxxxxxxxxxxx>>
Subject: [EXTERNAL] Re: Fw: Large OMAP Objects & Pubsub

Hi Alex,
Not sure I can help you here. We recommend using the "bucket notification" [1] mechanism over "pubsub" [2] (since it is not maintained, lacks many functionalities, and will be deprecated).
If you are concerned with kafka outages, you can use persistent notifications [3] (they will retry until the broker is up again) which have been available since Ceph 16 (pacific).

It looks like an issue with the site syncing process (which drives pubsub), so I will try to figure out if there is a simple fix here.

Yuval

[1] https://docs.ceph.com/en/latest/radosgw/notifications/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fnotifications%2F&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PTwnZyi9y6jdXGkvdD2HeoMxNL%2BlughLO5qy3vtlGCA%3D&reserved=0>
[2] https://docs.ceph.com/en/latest/radosgw/pubsub-module/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fpubsub-module%2F&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6muaFhFiDJRH%2B3s1tF5akT2BYXRHL1ejGHdTVq2GhKE%3D&reserved=0>
[3] https://docs.ceph.com/en/latest/radosgw/notifications/#notification-reliability<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fnotifications%2F%23notification-reliability&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HSBo9d3x1gj7vbqmMM6jNbGD8%2Bi8NquvrdjnT3mEVY8%3D&reserved=0>

On Wed, Oct 26, 2022 at 11:57 AM Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx<mailto:alexhus@xxxxxxxxxxxxx>> wrote:
Hi Yuval,

Hope you are well. I think pubsub is your area of expertise (we've briefly discussed it in the past).

Would love to get your advice on the below email if possible.

Kindest regards,
Alex
________________________________
From: Alex Hussein-Kershaw (HE/HIM)
Sent: Tuesday, October 25, 2022 2:48 PM
To: Ceph Users <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>>
Subject: Large OMAP Objects & Pubsub

Hi All,

Looking to get some advice on an issue my clusters have been suffering from. Realize there are lots of text below. Thanks in advance for your consideration.

The cluster has a health warning of "32 large omap objects". It's persisted for several months.

It appears functional and there are no indications of a performance problem at the client for now (no slow ops - everything seems to work fine). It is a multisite cluster with CephFS & S3 in use, as well as pubsub. It is running Ceph version 15.2.13.

We run automated client load tests against this system every day and have been doing that for a year or longer against this system. The key counts of the large OMAP objects in question are growing, I've monitored this over a period of several months. Intuitively I gather this means at some point in the future I will hit performance problems as a result of this.

Large OMAP objects are split across two pools: siteApubsub.rgw.log and siteApubsub.rgw.buckets.index. My client is responsible for processing the pubsub queue. It appears to be doing that successfully: there are no objects in the pubsub data pool as shown in the details below.

I've been keeping a spreadsheet to track the growth of these, assuming I can't attach a file to the mailing list so I've uploaded an image of it here: https://imgur.com/a/gAtAcvp<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fa%2FgAtAcvp&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1RId6I4egHiYi4s2Ixs%2FdVvep3jqvU%2FpZTjjSll9i98%3D&reserved=0>. The data shows constant growth of all of these objects through the last couple of months. It also includes the names of the objects, where there are two categories:

  *   16 instances of objects with names like: 9:03d18f4d:::data_log.47:head
  *   16 instances of objects with names like: 13:0118e6b8:::.dir.4f442377-4b71-4c6a-aaa9-ba945d7694f8.84778.1.15:head
Please find output of a few Ceph commands below giving details of the cluster.

  *   I'm really keen to understand this better and would be more than happy to share additional diags.
  *   I'd like to understand what I need to do to remove these large OMAP objects and prevent future build ups, so I don't need to worry about the stability of this system.
Thanks,
Alex

$ ceph -s
    id:     0b91b8be-3e01-4240-bea5-df01c7e53b7c
    health: HEALTH_WARN
            32 large omap objects

  services:
    mon: 3 daemons, quorum albans_sc0,albans_sc1,albans_sc2 (age 6w)
    mgr: albans_sc2(active, since 6w), standbys: albans_sc1, albans_sc0
    mds: cephfs:1 {0=albans_sc2=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 6w), 3 in (since 10M)
    rgw: 6 daemons active (albans_sc0.pubsub, albans_sc0.rgw0, albans_sc1.pubsub, albans_sc1.rgw0, albans_sc2.pubsub, albans_sc2.rgw0)

  task status:

  data:
    pools:   14 pools, 137 pgs
    objects: 4.52M objects, 160 GiB
    usage:   536 GiB used, 514 GiB / 1.0 TiB avail
    pgs:     137 active+clean

  io:
    client:   28 MiB/s rd, 1.2 MiB/s wr, 673 op/s rd, 189 op/s wr

$ ceph health detail
HEALTH_WARN 32 large omap objects
[WRN] LARGE_OMAP_OBJECTS: 32 large omap objects
    16 large objects found in pool 'siteApubsub.rgw.log'
    16 large objects found in pool 'siteApubsub.rgw.buckets.index'
    Search the cluster log for 'Large omap object found' for more details.

$ ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
ssd    1.0 TiB  514 GiB  496 GiB   536 GiB      51.07
TOTAL  1.0 TiB  514 GiB  496 GiB   536 GiB      51.07

--- POOLS ---
POOL                           ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics           1    1      0 B        0      0 B      0    153 GiB
cephfs_data                     2   32  135 GiB    1.99M  415 GiB  47.50    153 GiB
cephfs_metadata                 3   32  3.3 GiB    2.09M  9.8 GiB   2.09    153 GiB
siteA.rgw.buckets.data          4   32   24 GiB  438.62k   80 GiB  14.88    153 GiB
.rgw.root                       5    4   19 KiB       29  1.3 MiB      0    153 GiB
siteA.rgw.log                   6    4   79 MiB      799  247 MiB   0.05    153 GiB
siteA.rgw.control               7    4      0 B        8      0 B      0    153 GiB
siteA.rgw.meta                  8    4   13 KiB       37  1.6 MiB      0    153 GiB
siteApubsub.rgw.log             9    4  1.9 GiB      789  5.7 GiB   1.22    153 GiB
siteA.rgw.buckets.index        10    4  456 MiB       31  1.3 GiB   0.29    153 GiB
siteApubsub.rgw.control        11    4      0 B        8      0 B      0    153 GiB
siteApubsub.rgw.meta           12    4   11 KiB       40  1.7 MiB      0    153 GiB
siteApubsub.rgw.buckets.index  13    4  2.0 GiB       47  6.1 GiB   1.31    153 GiB
siteApubsub.rgw.buckets.data   14    4      0 B        0      0 B      0    153 GiB

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx