https://gist.github.com/RaminNietzsche/b8702014c333f3a44a995d5a6d4a56be On Mon, Oct 16, 2023 at 4:27 PM Frank Schilder <frans@xxxxxx> wrote: > Hi Eugen, > > the warning threshold is per omap object, not per PG (which apparently has > more than 1 omap object). Still, I misread the numbers by 1 order, which > means that the difference between the last 2 entries is about 100000, which > does point towards a large omap object in PG 12.193. > > I issued a deep-scrub on this PG and the warning is resolved. > > Thanks and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Eugen Block <eblock@xxxxxx> > Sent: Monday, October 16, 2023 2:41 PM > To: Frank Schilder > Cc: ceph-users@xxxxxxx > Subject: Re: Re: find PG with large omap object > > Hi Frank, > > > # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] | > > sort_by(.nk)' pgs.dump | tail > > }, > > { > > "id": "12.17b", > > "nk": 1493776 > > }, > > { > > "id": "12.193", > > "nk": 1583589 > > } > > those numbers are > 1 million and the warning threshold is 200k. So a > warning is expected. > > Zitat von Frank Schilder <frans@xxxxxx>: > > > Hi Eugen, > > > > thanks for the one-liner :) I'm afraid I'm in the same position as > > before though. > > > > I dumped all PGs to a file and executed these 2 commands: > > > > # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_bytes}] > > | sort_by(.nk)' pgs.dump | tail > > }, > > { > > "id": "12.193", > > "nk": 1002401056 > > }, > > { > > "id": "21.0", > > "nk": 1235777228 > > } > > ] > > > > # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] | > > sort_by(.nk)' pgs.dump | tail > > }, > > { > > "id": "12.17b", > > "nk": 1493776 > > }, > > { > > "id": "12.193", > > "nk": 1583589 > > } > > ] > > > > Neither is beyond the warn limit and pool 12 is indeed the pool > > where the warnings came from. OK, now back to the logs: > > > > # zgrep -i 'Large omap object found. Object:' /var/log/ceph/ceph.log-* > > /var/log/ceph/ceph.log-20231008.gz:2023-10-05T01:25:14.581962+0200 > > osd.592 (osd.592) 104 : cluster [WRN] Large omap object found. > > Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key > > count: 200001 Size (bytes): 230080309 > > /var/log/ceph/ceph.log-20231008.gz:2023-10-07T04:33:02.678879+0200 > > osd.949 (osd.949) 6897 : cluster [WRN] Large omap object found. > > Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key > > count: 200243 Size (bytes): 230307097 > > /var/log/ceph/ceph.log-20231008.gz:2023-10-07T07:22:40.512228+0200 > > osd.988 (osd.988) 4365 : cluster [WRN] Large omap object found. > > Object: 12:eb96322f:::637.00000000:head PG: 12.f44c69d7 (12.1d7) Key > > count: 200329 Size (bytes): 230310393 > > /var/log/ceph/ceph.log-20231008.gz:2023-10-07T15:08:03.785186+0200 > > osd.50 (osd.50) 4549 : cluster [WRN] Large omap object found. > > Object: 12:08fb0eb7:::635.00000000:head PG: 12.ed70df10 (12.110) Key > > count: 200183 Size (bytes): 230150641 > > /var/log/ceph/ceph.log-20231008.gz:2023-10-07T16:37:12.901470+0200 > > osd.18 (osd.18) 7011 : cluster [WRN] Large omap object found. > > Object: 12:d6758956:::634.00000000:head PG: 12.6a91ae6b (12.6b) Key > > count: 200247 Size (bytes): 230343371 > > /var/log/ceph/ceph.log-20231008.gz:2023-10-08T01:25:16.125068+0200 > > osd.980 (osd.980) 308 : cluster [WRN] Large omap object found. > > Object: 12:63f985e7:::639.00000000:head PG: 12.e7a19fc6 (12.1c6) Key > > count: 200160 Size (bytes): 230179282 > > /var/log/ceph/ceph.log-20231015:2023-10-09T00:51:32.587849+0200 > > osd.563 (osd.563) 3661 : cluster [WRN] Large omap object found. > > Object: 12:44346421:::632.00000000:head PG: 12.84262c22 (12.22) Key > > count: 200325 Size (bytes): 230481029 > > /var/log/ceph/ceph.log-20231015:2023-10-09T15:35:28.803117+0200 > > osd.949 (osd.949) 7088 : cluster [WRN] Large omap object found. > > Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key > > count: 200327 Size (bytes): 230404872 > > /var/log/ceph/ceph.log-20231015:2023-10-09T18:51:35.615096+0200 > > osd.592 (osd.592) 461 : cluster [WRN] Large omap object found. > > Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key > > count: 200228 Size (bytes): 230347361 > > > > The warnings report a key count > 200000, but none of the PGs in the > > dump does. Apparently, all these PGs were (deep-) scrubbed already > > and the omap key count was updated (or am I misunderstanding > > something here). I still don't know and can neither conclude which > > PG the warning originates from. As far as I can tell, the warning > > should not be there. > > > > Do you have an idea how to continue diagnosis from here apart from > > just trying a deep scrub on all PGs in the list from the log? > > > > Thanks and best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ________________________________________ > > From: Eugen Block <eblock@xxxxxx> > > Sent: Monday, October 16, 2023 1:41 PM > > To: ceph-users@xxxxxxx > > Subject: Re: find PG with large omap object > > > > Hi, > > not sure if this is what you need, but if you know the pool id (you > > probably should) you could try this, it's from an Octopus test cluster > > (assuming the warning was for the number of keys, not bytes): > > > > $ ceph -f json pg dump pgs 2>/dev/null | jq -r '.pg_stats[] | select > > (.pgid | startswith("17.")) | .pgid + " " + > > "\(.stat_sum.num_omap_keys)"' > > 17.6 191 > > 17.7 759 > > 17.4 358 > > 17.5 0 > > 17.2 177 > > 17.3 1 > > 17.0 375 > > 17.1 176 > > > > If you don't know the pool you could sort the ouput by the second > > column and see which PG has the largest number of omap_keys. > > > > Regards, > > Eugen > > > > Zitat von Frank Schilder <frans@xxxxxx>: > > > >> Hi all, > >> > >> we had a bunch of large omap object warnings after a user deleted a > >> lot of files on a ceph fs with snapshots. After the snapshots were > >> rotated out, all but one of these warnings disappeared over time. > >> However, one warning is stuck and I wonder if its something else. > >> > >> Is there a reasonable way (say, one-liner with no more than 120 > >> characters) to get ceph to tell me which PG this is coming from? I > >> just want to issue a deep scrub to check if it disappears and going > >> through the logs and querying every single object for its key count > >> seems a bit of a hassle for something that ought to be part of "ceph > >> health detail". > >> > >> Best regards, > >> ================= > >> Frank Schilder > >> AIT Risø Campus > >> Bygning 109, rum S14 > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx