Hi Eugen, the warning threshold is per omap object, not per PG (which apparently has more than 1 omap object). Still, I misread the numbers by 1 order, which means that the difference between the last 2 entries is about 100000, which does point towards a large omap object in PG 12.193. I issued a deep-scrub on this PG and the warning is resolved. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Eugen Block <eblock@xxxxxx> Sent: Monday, October 16, 2023 2:41 PM To: Frank Schilder Cc: ceph-users@xxxxxxx Subject: Re: Re: find PG with large omap object Hi Frank, > # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] | > sort_by(.nk)' pgs.dump | tail > }, > { > "id": "12.17b", > "nk": 1493776 > }, > { > "id": "12.193", > "nk": 1583589 > } those numbers are > 1 million and the warning threshold is 200k. So a warning is expected. Zitat von Frank Schilder <frans@xxxxxx>: > Hi Eugen, > > thanks for the one-liner :) I'm afraid I'm in the same position as > before though. > > I dumped all PGs to a file and executed these 2 commands: > > # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_bytes}] > | sort_by(.nk)' pgs.dump | tail > }, > { > "id": "12.193", > "nk": 1002401056 > }, > { > "id": "21.0", > "nk": 1235777228 > } > ] > > # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] | > sort_by(.nk)' pgs.dump | tail > }, > { > "id": "12.17b", > "nk": 1493776 > }, > { > "id": "12.193", > "nk": 1583589 > } > ] > > Neither is beyond the warn limit and pool 12 is indeed the pool > where the warnings came from. OK, now back to the logs: > > # zgrep -i 'Large omap object found. Object:' /var/log/ceph/ceph.log-* > /var/log/ceph/ceph.log-20231008.gz:2023-10-05T01:25:14.581962+0200 > osd.592 (osd.592) 104 : cluster [WRN] Large omap object found. > Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key > count: 200001 Size (bytes): 230080309 > /var/log/ceph/ceph.log-20231008.gz:2023-10-07T04:33:02.678879+0200 > osd.949 (osd.949) 6897 : cluster [WRN] Large omap object found. > Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key > count: 200243 Size (bytes): 230307097 > /var/log/ceph/ceph.log-20231008.gz:2023-10-07T07:22:40.512228+0200 > osd.988 (osd.988) 4365 : cluster [WRN] Large omap object found. > Object: 12:eb96322f:::637.00000000:head PG: 12.f44c69d7 (12.1d7) Key > count: 200329 Size (bytes): 230310393 > /var/log/ceph/ceph.log-20231008.gz:2023-10-07T15:08:03.785186+0200 > osd.50 (osd.50) 4549 : cluster [WRN] Large omap object found. > Object: 12:08fb0eb7:::635.00000000:head PG: 12.ed70df10 (12.110) Key > count: 200183 Size (bytes): 230150641 > /var/log/ceph/ceph.log-20231008.gz:2023-10-07T16:37:12.901470+0200 > osd.18 (osd.18) 7011 : cluster [WRN] Large omap object found. > Object: 12:d6758956:::634.00000000:head PG: 12.6a91ae6b (12.6b) Key > count: 200247 Size (bytes): 230343371 > /var/log/ceph/ceph.log-20231008.gz:2023-10-08T01:25:16.125068+0200 > osd.980 (osd.980) 308 : cluster [WRN] Large omap object found. > Object: 12:63f985e7:::639.00000000:head PG: 12.e7a19fc6 (12.1c6) Key > count: 200160 Size (bytes): 230179282 > /var/log/ceph/ceph.log-20231015:2023-10-09T00:51:32.587849+0200 > osd.563 (osd.563) 3661 : cluster [WRN] Large omap object found. > Object: 12:44346421:::632.00000000:head PG: 12.84262c22 (12.22) Key > count: 200325 Size (bytes): 230481029 > /var/log/ceph/ceph.log-20231015:2023-10-09T15:35:28.803117+0200 > osd.949 (osd.949) 7088 : cluster [WRN] Large omap object found. > Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key > count: 200327 Size (bytes): 230404872 > /var/log/ceph/ceph.log-20231015:2023-10-09T18:51:35.615096+0200 > osd.592 (osd.592) 461 : cluster [WRN] Large omap object found. > Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key > count: 200228 Size (bytes): 230347361 > > The warnings report a key count > 200000, but none of the PGs in the > dump does. Apparently, all these PGs were (deep-) scrubbed already > and the omap key count was updated (or am I misunderstanding > something here). I still don't know and can neither conclude which > PG the warning originates from. As far as I can tell, the warning > should not be there. > > Do you have an idea how to continue diagnosis from here apart from > just trying a deep scrub on all PGs in the list from the log? > > Thanks and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Eugen Block <eblock@xxxxxx> > Sent: Monday, October 16, 2023 1:41 PM > To: ceph-users@xxxxxxx > Subject: Re: find PG with large omap object > > Hi, > not sure if this is what you need, but if you know the pool id (you > probably should) you could try this, it's from an Octopus test cluster > (assuming the warning was for the number of keys, not bytes): > > $ ceph -f json pg dump pgs 2>/dev/null | jq -r '.pg_stats[] | select > (.pgid | startswith("17.")) | .pgid + " " + > "\(.stat_sum.num_omap_keys)"' > 17.6 191 > 17.7 759 > 17.4 358 > 17.5 0 > 17.2 177 > 17.3 1 > 17.0 375 > 17.1 176 > > If you don't know the pool you could sort the ouput by the second > column and see which PG has the largest number of omap_keys. > > Regards, > Eugen > > Zitat von Frank Schilder <frans@xxxxxx>: > >> Hi all, >> >> we had a bunch of large omap object warnings after a user deleted a >> lot of files on a ceph fs with snapshots. After the snapshots were >> rotated out, all but one of these warnings disappeared over time. >> However, one warning is stuck and I wonder if its something else. >> >> Is there a reasonable way (say, one-liner with no more than 120 >> characters) to get ceph to tell me which PG this is coming from? I >> just want to issue a deep scrub to check if it disappears and going >> through the logs and querying every single object for its key count >> seems a bit of a hassle for something that ought to be part of "ceph >> health detail". >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx