Hi Yuval, Thanks for the info. So, this is a side effect of pub sub sitting on-top of the RGW sync mechanism? I've re-included ceph-users mailing list on this email in case anyone has ideas how to alleviate this. Some good news on my part is that I've managed to clear 16 of the large OMAP objects with the instructions here [1]. That is, bilog trimming and running a deep scrub on the affected PGs. That leaves the large OMAP objects in the "siteApubsub.rgw.log" pool that I am still hoping to find a way to clear. These are the objects of the form "9:03d18f4d:::data_log.47:head". From [2] I gather that these are used for multisite syncing. Our pubsub zones are not syncing between multisite. I wonder if that makes this simply a misconfiguration and the fix is just a correction to config. I've been doing some digging today and found that our pubsub zone has the following config: { "id": "4f442377-4b71-4c6a-aaa9-ba945d7694f8", "name": "siteApubsub", "endpoints": [ https://10.225.41.200:7481, https://10.225.41.201:7481, https://10.225.41.202:7481 ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 11, "read_only": "false", "tier_type": "pubsub", "sync_from_all": "false", "sync_from": [ "siteA" ], "redirect_zone": "" } And sync status shows... source: 4f442377-4b71-4c6a-aaa9-ba945d7694f8 (siteApubsub) not syncing from zone If I set the "log_data" field to false, I think this simply stops writing these files which are not required anyway. And presumably have been building up gradually forever as the normal trimming is not occurring as there is no multisite sync. So my question to any who may be able to answer: * Is the above analysis sound? * Can I update the zone config and delete these data_log objects manual to restore my cluster to HEALTH_OK? Thanks, Alex [1] https://access.redhat.com/solutions/6450561 [2] https://www.spinics.net/lists/ceph-users/msg54282.html From: Yuval Lifshitz <ylifshit@xxxxxxxxxx> Sent: Thursday, October 27, 2022 5:35 PM To: Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx> Subject: Re: [EXTERNAL] Re: Fw: Large OMAP Objects & Pubsub Hi Alex, I checked with the RGW people working on multisite, they say they observed that in high-load tests (unrelated to pubsub). This means that even if this is fixed, the fix is not going to be backported to octopus. If they have some kind of workaround, I will let you know. Yuval On Thu, Oct 27, 2022 at 5:50 PM Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx<mailto:alexhus@xxxxxxxxxxxxx>> wrote: Hi Yuval, Thanks for your reply and consideration. It's much appreciated. We don't use kafka (nor do I know what it is - I had a quick google) but I think the concern is the same - if our client goes down and misses notifications from Ceph we need Ceph to resend it until it is acknowledged. Sounds like the bucket notification and persistent notifications fits this requirement perfectly. I'll flag with my Team that this is available in Pacific, and that we should take it when we move. That said, we're still on Octopus for our main release so while that gives us a direction for future, I'd still like to find a solution to the initial problem as we have slow-moving customers who might stick with Octopus for several years even after we offer a Pacific (and bucket notification) based solution. Interestingly we've not seen this at any customers systems, only on our heavily loaded test system. I suspect the high and regular load this system receives must be the cause of this. I've contemplated fully stopping the load for a month or so and observing the effect. I wonder if we're out-pacing some clean-up mechanism (I think we've seen similar things elsewhere in our Ceph usage). However, we're fairly limited on virtualisation rig space and don't want to sit this system idle if we can avoid it. Best wishes, Alex From: Yuval Lifshitz <ylifshit@xxxxxxxxxx<mailto:ylifshit@xxxxxxxxxx>> Sent: Thursday, October 27, 2022 10:05 AM To: Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx<mailto:alexhus@xxxxxxxxxxxxx>> Subject: [EXTERNAL] Re: Fw: Large OMAP Objects & Pubsub Hi Alex, Not sure I can help you here. We recommend using the "bucket notification" [1] mechanism over "pubsub" [2] (since it is not maintained, lacks many functionalities, and will be deprecated). If you are concerned with kafka outages, you can use persistent notifications [3] (they will retry until the broker is up again) which have been available since Ceph 16 (pacific). It looks like an issue with the site syncing process (which drives pubsub), so I will try to figure out if there is a simple fix here. Yuval [1] https://docs.ceph.com/en/latest/radosgw/notifications/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fnotifications%2F&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PTwnZyi9y6jdXGkvdD2HeoMxNL%2BlughLO5qy3vtlGCA%3D&reserved=0> [2] https://docs.ceph.com/en/latest/radosgw/pubsub-module/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fpubsub-module%2F&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6muaFhFiDJRH%2B3s1tF5akT2BYXRHL1ejGHdTVq2GhKE%3D&reserved=0> [3] https://docs.ceph.com/en/latest/radosgw/notifications/#notification-reliability<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fradosgw%2Fnotifications%2F%23notification-reliability&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HSBo9d3x1gj7vbqmMM6jNbGD8%2Bi8NquvrdjnT3mEVY8%3D&reserved=0> On Wed, Oct 26, 2022 at 11:57 AM Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx<mailto:alexhus@xxxxxxxxxxxxx>> wrote: Hi Yuval, Hope you are well. I think pubsub is your area of expertise (we've briefly discussed it in the past). Would love to get your advice on the below email if possible. Kindest regards, Alex ________________________________ From: Alex Hussein-Kershaw (HE/HIM) Sent: Tuesday, October 25, 2022 2:48 PM To: Ceph Users <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>> Subject: Large OMAP Objects & Pubsub Hi All, Looking to get some advice on an issue my clusters have been suffering from. Realize there are lots of text below. Thanks in advance for your consideration. The cluster has a health warning of "32 large omap objects". It's persisted for several months. It appears functional and there are no indications of a performance problem at the client for now (no slow ops - everything seems to work fine). It is a multisite cluster with CephFS & S3 in use, as well as pubsub. It is running Ceph version 15.2.13. We run automated client load tests against this system every day and have been doing that for a year or longer against this system. The key counts of the large OMAP objects in question are growing, I've monitored this over a period of several months. Intuitively I gather this means at some point in the future I will hit performance problems as a result of this. Large OMAP objects are split across two pools: siteApubsub.rgw.log and siteApubsub.rgw.buckets.index. My client is responsible for processing the pubsub queue. It appears to be doing that successfully: there are no objects in the pubsub data pool as shown in the details below. I've been keeping a spreadsheet to track the growth of these, assuming I can't attach a file to the mailing list so I've uploaded an image of it here: https://imgur.com/a/gAtAcvp<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fa%2FgAtAcvp&data=05%7C01%7Calexhus%40microsoft.com%7C549e8523888340a72b6508dab8394bc2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638024853487141761%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1RId6I4egHiYi4s2Ixs%2FdVvep3jqvU%2FpZTjjSll9i98%3D&reserved=0>. The data shows constant growth of all of these objects through the last couple of months. It also includes the names of the objects, where there are two categories: * 16 instances of objects with names like: 9:03d18f4d:::data_log.47:head * 16 instances of objects with names like: 13:0118e6b8:::.dir.4f442377-4b71-4c6a-aaa9-ba945d7694f8.84778.1.15:head Please find output of a few Ceph commands below giving details of the cluster. * I'm really keen to understand this better and would be more than happy to share additional diags. * I'd like to understand what I need to do to remove these large OMAP objects and prevent future build ups, so I don't need to worry about the stability of this system. Thanks, Alex $ ceph -s id: 0b91b8be-3e01-4240-bea5-df01c7e53b7c health: HEALTH_WARN 32 large omap objects services: mon: 3 daemons, quorum albans_sc0,albans_sc1,albans_sc2 (age 6w) mgr: albans_sc2(active, since 6w), standbys: albans_sc1, albans_sc0 mds: cephfs:1 {0=albans_sc2=up:active} 2 up:standby osd: 3 osds: 3 up (since 6w), 3 in (since 10M) rgw: 6 daemons active (albans_sc0.pubsub, albans_sc0.rgw0, albans_sc1.pubsub, albans_sc1.rgw0, albans_sc2.pubsub, albans_sc2.rgw0) task status: data: pools: 14 pools, 137 pgs objects: 4.52M objects, 160 GiB usage: 536 GiB used, 514 GiB / 1.0 TiB avail pgs: 137 active+clean io: client: 28 MiB/s rd, 1.2 MiB/s wr, 673 op/s rd, 189 op/s wr $ ceph health detail HEALTH_WARN 32 large omap objects [WRN] LARGE_OMAP_OBJECTS: 32 large omap objects 16 large objects found in pool 'siteApubsub.rgw.log' 16 large objects found in pool 'siteApubsub.rgw.buckets.index' Search the cluster log for 'Large omap object found' for more details. $ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 1.0 TiB 514 GiB 496 GiB 536 GiB 51.07 TOTAL 1.0 TiB 514 GiB 496 GiB 536 GiB 51.07 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 153 GiB cephfs_data 2 32 135 GiB 1.99M 415 GiB 47.50 153 GiB cephfs_metadata 3 32 3.3 GiB 2.09M 9.8 GiB 2.09 153 GiB siteA.rgw.buckets.data 4 32 24 GiB 438.62k 80 GiB 14.88 153 GiB .rgw.root 5 4 19 KiB 29 1.3 MiB 0 153 GiB siteA.rgw.log 6 4 79 MiB 799 247 MiB 0.05 153 GiB siteA.rgw.control 7 4 0 B 8 0 B 0 153 GiB siteA.rgw.meta 8 4 13 KiB 37 1.6 MiB 0 153 GiB siteApubsub.rgw.log 9 4 1.9 GiB 789 5.7 GiB 1.22 153 GiB siteA.rgw.buckets.index 10 4 456 MiB 31 1.3 GiB 0.29 153 GiB siteApubsub.rgw.control 11 4 0 B 8 0 B 0 153 GiB siteApubsub.rgw.meta 12 4 11 KiB 40 1.7 MiB 0 153 GiB siteApubsub.rgw.buckets.index 13 4 2.0 GiB 47 6.1 GiB 1.31 153 GiB siteApubsub.rgw.buckets.data 14 4 0 B 0 0 B 0 153 GiB _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx