Hi Eugen, thanks for the one-liner :) I'm afraid I'm in the same position as before though. I dumped all PGs to a file and executed these 2 commands: # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_bytes}] | sort_by(.nk)' pgs.dump | tail }, { "id": "12.193", "nk": 1002401056 }, { "id": "21.0", "nk": 1235777228 } ] # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] | sort_by(.nk)' pgs.dump | tail }, { "id": "12.17b", "nk": 1493776 }, { "id": "12.193", "nk": 1583589 } ] Neither is beyond the warn limit and pool 12 is indeed the pool where the warnings came from. OK, now back to the logs: # zgrep -i 'Large omap object found. Object:' /var/log/ceph/ceph.log-* /var/log/ceph/ceph.log-20231008.gz:2023-10-05T01:25:14.581962+0200 osd.592 (osd.592) 104 : cluster [WRN] Large omap object found. Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key count: 200001 Size (bytes): 230080309 /var/log/ceph/ceph.log-20231008.gz:2023-10-07T04:33:02.678879+0200 osd.949 (osd.949) 6897 : cluster [WRN] Large omap object found. Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key count: 200243 Size (bytes): 230307097 /var/log/ceph/ceph.log-20231008.gz:2023-10-07T07:22:40.512228+0200 osd.988 (osd.988) 4365 : cluster [WRN] Large omap object found. Object: 12:eb96322f:::637.00000000:head PG: 12.f44c69d7 (12.1d7) Key count: 200329 Size (bytes): 230310393 /var/log/ceph/ceph.log-20231008.gz:2023-10-07T15:08:03.785186+0200 osd.50 (osd.50) 4549 : cluster [WRN] Large omap object found. Object: 12:08fb0eb7:::635.00000000:head PG: 12.ed70df10 (12.110) Key count: 200183 Size (bytes): 230150641 /var/log/ceph/ceph.log-20231008.gz:2023-10-07T16:37:12.901470+0200 osd.18 (osd.18) 7011 : cluster [WRN] Large omap object found. Object: 12:d6758956:::634.00000000:head PG: 12.6a91ae6b (12.6b) Key count: 200247 Size (bytes): 230343371 /var/log/ceph/ceph.log-20231008.gz:2023-10-08T01:25:16.125068+0200 osd.980 (osd.980) 308 : cluster [WRN] Large omap object found. Object: 12:63f985e7:::639.00000000:head PG: 12.e7a19fc6 (12.1c6) Key count: 200160 Size (bytes): 230179282 /var/log/ceph/ceph.log-20231015:2023-10-09T00:51:32.587849+0200 osd.563 (osd.563) 3661 : cluster [WRN] Large omap object found. Object: 12:44346421:::632.00000000:head PG: 12.84262c22 (12.22) Key count: 200325 Size (bytes): 230481029 /var/log/ceph/ceph.log-20231015:2023-10-09T15:35:28.803117+0200 osd.949 (osd.949) 7088 : cluster [WRN] Large omap object found. Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key count: 200327 Size (bytes): 230404872 /var/log/ceph/ceph.log-20231015:2023-10-09T18:51:35.615096+0200 osd.592 (osd.592) 461 : cluster [WRN] Large omap object found. Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key count: 200228 Size (bytes): 230347361 The warnings report a key count > 200000, but none of the PGs in the dump does. Apparently, all these PGs were (deep-) scrubbed already and the omap key count was updated (or am I misunderstanding something here). I still don't know and can neither conclude which PG the warning originates from. As far as I can tell, the warning should not be there. Do you have an idea how to continue diagnosis from here apart from just trying a deep scrub on all PGs in the list from the log? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Eugen Block <eblock@xxxxxx> Sent: Monday, October 16, 2023 1:41 PM To: ceph-users@xxxxxxx Subject: Re: find PG with large omap object Hi, not sure if this is what you need, but if you know the pool id (you probably should) you could try this, it's from an Octopus test cluster (assuming the warning was for the number of keys, not bytes): $ ceph -f json pg dump pgs 2>/dev/null | jq -r '.pg_stats[] | select (.pgid | startswith("17.")) | .pgid + " " + "\(.stat_sum.num_omap_keys)"' 17.6 191 17.7 759 17.4 358 17.5 0 17.2 177 17.3 1 17.0 375 17.1 176 If you don't know the pool you could sort the ouput by the second column and see which PG has the largest number of omap_keys. Regards, Eugen Zitat von Frank Schilder <frans@xxxxxx>: > Hi all, > > we had a bunch of large omap object warnings after a user deleted a > lot of files on a ceph fs with snapshots. After the snapshots were > rotated out, all but one of these warnings disappeared over time. > However, one warning is stuck and I wonder if its something else. > > Is there a reasonable way (say, one-liner with no more than 120 > characters) to get ceph to tell me which PG this is coming from? I > just want to issue a deep scrub to check if it disappears and going > through the logs and querying every single object for its key count > seems a bit of a hassle for something that ought to be part of "ceph > health detail". > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx