Re: find PG with large omap object

Eugen Block <eblock@xxxxxx> · Mon, 16 Oct 2023 12:41:09 +0000

Hi Frank,

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] |  
sort_by(.nk)' pgs.dump | tail
  },
  {
    "id": "12.17b",
    "nk": 1493776
  },
  {
    "id": "12.193",
    "nk": 1583589
  }

those numbers are > 1 million and the warning threshold is 200k. So a  
warning is expected.

Zitat von Frank Schilder <frans@xxxxxx>:

Hi Eugen,

thanks for the one-liner :) I'm afraid I'm in the same position as  
before though.

I dumped all PGs to a file and executed these 2 commands:

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_bytes}]  
| sort_by(.nk)' pgs.dump | tail
  },
  {
    "id": "12.193",
    "nk": 1002401056
  },
  {
    "id": "21.0",
    "nk": 1235777228
  }
]

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] |  
sort_by(.nk)' pgs.dump | tail
  },
  {
    "id": "12.17b",
    "nk": 1493776
  },
  {
    "id": "12.193",
    "nk": 1583589
  }
]

Neither is beyond the warn limit and pool 12 is indeed the pool  
where the warnings came from. OK, now back to the logs:

# zgrep -i 'Large omap object found. Object:' /var/log/ceph/ceph.log-*
/var/log/ceph/ceph.log-20231008.gz:2023-10-05T01:25:14.581962+0200  
osd.592 (osd.592) 104 : cluster [WRN] Large omap object found.  
Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key  
count: 200001 Size (bytes): 230080309
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T04:33:02.678879+0200  
osd.949 (osd.949) 6897 : cluster [WRN] Large omap object found.  
Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key  
count: 200243 Size (bytes): 230307097
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T07:22:40.512228+0200  
osd.988 (osd.988) 4365 : cluster [WRN] Large omap object found.  
Object: 12:eb96322f:::637.00000000:head PG: 12.f44c69d7 (12.1d7) Key  
count: 200329 Size (bytes): 230310393
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T15:08:03.785186+0200  
osd.50 (osd.50) 4549 : cluster [WRN] Large omap object found.  
Object: 12:08fb0eb7:::635.00000000:head PG: 12.ed70df10 (12.110) Key  
count: 200183 Size (bytes): 230150641
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T16:37:12.901470+0200  
osd.18 (osd.18) 7011 : cluster [WRN] Large omap object found.  
Object: 12:d6758956:::634.00000000:head PG: 12.6a91ae6b (12.6b) Key  
count: 200247 Size (bytes): 230343371
/var/log/ceph/ceph.log-20231008.gz:2023-10-08T01:25:16.125068+0200  
osd.980 (osd.980) 308 : cluster [WRN] Large omap object found.  
Object: 12:63f985e7:::639.00000000:head PG: 12.e7a19fc6 (12.1c6) Key  
count: 200160 Size (bytes): 230179282
/var/log/ceph/ceph.log-20231015:2023-10-09T00:51:32.587849+0200  
osd.563 (osd.563) 3661 : cluster [WRN] Large omap object found.  
Object: 12:44346421:::632.00000000:head PG: 12.84262c22 (12.22) Key  
count: 200325 Size (bytes): 230481029
/var/log/ceph/ceph.log-20231015:2023-10-09T15:35:28.803117+0200  
osd.949 (osd.949) 7088 : cluster [WRN] Large omap object found.  
Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key  
count: 200327 Size (bytes): 230404872
/var/log/ceph/ceph.log-20231015:2023-10-09T18:51:35.615096+0200  
osd.592 (osd.592) 461 : cluster [WRN] Large omap object found.  
Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key  
count: 200228 Size (bytes): 230347361

The warnings report a key count > 200000, but none of the PGs in the  
dump does. Apparently, all these PGs were (deep-) scrubbed already  
and the omap key count was updated (or am I misunderstanding  
something here). I still don't know and can neither conclude which  
PG the warning originates from. As far as I can tell, the warning  
should not be there.

Do you have an idea how to continue diagnosis from here apart from  
just trying a deep scrub on all PGs in the list from the log?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Monday, October 16, 2023 1:41 PM
To: ceph-users@xxxxxxx
Subject:  Re: find PG with large omap object

Hi,
not sure if this is what you need, but if you know the pool id (you
probably should) you could try this, it's from an Octopus test cluster
(assuming the warning was for the number of keys, not bytes):

$ ceph -f json pg dump pgs 2>/dev/null | jq -r '.pg_stats[] | select
(.pgid | startswith("17.")) |  .pgid + " " +
"\(.stat_sum.num_omap_keys)"'
17.6 191
17.7 759
17.4 358
17.5 0
17.2 177
17.3 1
17.0 375
17.1 176

If you don't know the pool you could sort the ouput by the second
column and see which PG has the largest number of omap_keys.

Regards,
Eugen

Zitat von Frank Schilder <frans@xxxxxx>:

Hi all,

we had a bunch of large omap object warnings after a user deleted a
lot of files on a ceph fs with snapshots. After the snapshots were
rotated out, all but one of these warnings disappeared over time.
However, one warning is stuck and I wonder if its something else.

Is there a reasonable way (say, one-liner with no more than 120
characters) to get ceph to tell me which PG this is coming from? I
just want to issue a deep scrub to check if it disappears and going
through the logs and querying every single object for its key count
seems a bit of a hassle for something that ought to be part of "ceph
health detail".

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx