Re: find PG with large omap object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Frank,

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] | sort_by(.nk)' pgs.dump | tail
  },
  {
    "id": "12.17b",
    "nk": 1493776
  },
  {
    "id": "12.193",
    "nk": 1583589
  }

those numbers are > 1 million and the warning threshold is 200k. So a warning is expected.

Zitat von Frank Schilder <frans@xxxxxx>:

Hi Eugen,

thanks for the one-liner :) I'm afraid I'm in the same position as before though.

I dumped all PGs to a file and executed these 2 commands:

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_bytes}] | sort_by(.nk)' pgs.dump | tail
  },
  {
    "id": "12.193",
    "nk": 1002401056
  },
  {
    "id": "21.0",
    "nk": 1235777228
  }
]

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] | sort_by(.nk)' pgs.dump | tail
  },
  {
    "id": "12.17b",
    "nk": 1493776
  },
  {
    "id": "12.193",
    "nk": 1583589
  }
]

Neither is beyond the warn limit and pool 12 is indeed the pool where the warnings came from. OK, now back to the logs:

# zgrep -i 'Large omap object found. Object:' /var/log/ceph/ceph.log-*
/var/log/ceph/ceph.log-20231008.gz:2023-10-05T01:25:14.581962+0200 osd.592 (osd.592) 104 : cluster [WRN] Large omap object found. Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key count: 200001 Size (bytes): 230080309 /var/log/ceph/ceph.log-20231008.gz:2023-10-07T04:33:02.678879+0200 osd.949 (osd.949) 6897 : cluster [WRN] Large omap object found. Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key count: 200243 Size (bytes): 230307097 /var/log/ceph/ceph.log-20231008.gz:2023-10-07T07:22:40.512228+0200 osd.988 (osd.988) 4365 : cluster [WRN] Large omap object found. Object: 12:eb96322f:::637.00000000:head PG: 12.f44c69d7 (12.1d7) Key count: 200329 Size (bytes): 230310393 /var/log/ceph/ceph.log-20231008.gz:2023-10-07T15:08:03.785186+0200 osd.50 (osd.50) 4549 : cluster [WRN] Large omap object found. Object: 12:08fb0eb7:::635.00000000:head PG: 12.ed70df10 (12.110) Key count: 200183 Size (bytes): 230150641 /var/log/ceph/ceph.log-20231008.gz:2023-10-07T16:37:12.901470+0200 osd.18 (osd.18) 7011 : cluster [WRN] Large omap object found. Object: 12:d6758956:::634.00000000:head PG: 12.6a91ae6b (12.6b) Key count: 200247 Size (bytes): 230343371 /var/log/ceph/ceph.log-20231008.gz:2023-10-08T01:25:16.125068+0200 osd.980 (osd.980) 308 : cluster [WRN] Large omap object found. Object: 12:63f985e7:::639.00000000:head PG: 12.e7a19fc6 (12.1c6) Key count: 200160 Size (bytes): 230179282 /var/log/ceph/ceph.log-20231015:2023-10-09T00:51:32.587849+0200 osd.563 (osd.563) 3661 : cluster [WRN] Large omap object found. Object: 12:44346421:::632.00000000:head PG: 12.84262c22 (12.22) Key count: 200325 Size (bytes): 230481029 /var/log/ceph/ceph.log-20231015:2023-10-09T15:35:28.803117+0200 osd.949 (osd.949) 7088 : cluster [WRN] Large omap object found. Object: 12:c9a32586:::63a.00000000:head PG: 12.61a4c593 (12.193) Key count: 200327 Size (bytes): 230404872 /var/log/ceph/ceph.log-20231015:2023-10-09T18:51:35.615096+0200 osd.592 (osd.592) 461 : cluster [WRN] Large omap object found. Object: 12:c05de58b:::63b.00000000:head PG: 12.d1a7ba03 (12.3) Key count: 200228 Size (bytes): 230347361

The warnings report a key count > 200000, but none of the PGs in the dump does. Apparently, all these PGs were (deep-) scrubbed already and the omap key count was updated (or am I misunderstanding something here). I still don't know and can neither conclude which PG the warning originates from. As far as I can tell, the warning should not be there.

Do you have an idea how to continue diagnosis from here apart from just trying a deep scrub on all PGs in the list from the log?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Monday, October 16, 2023 1:41 PM
To: ceph-users@xxxxxxx
Subject:  Re: find PG with large omap object

Hi,
not sure if this is what you need, but if you know the pool id (you
probably should) you could try this, it's from an Octopus test cluster
(assuming the warning was for the number of keys, not bytes):

$ ceph -f json pg dump pgs 2>/dev/null | jq -r '.pg_stats[] | select
(.pgid | startswith("17.")) |  .pgid + " " +
"\(.stat_sum.num_omap_keys)"'
17.6 191
17.7 759
17.4 358
17.5 0
17.2 177
17.3 1
17.0 375
17.1 176

If you don't know the pool you could sort the ouput by the second
column and see which PG has the largest number of omap_keys.

Regards,
Eugen

Zitat von Frank Schilder <frans@xxxxxx>:

Hi all,

we had a bunch of large omap object warnings after a user deleted a
lot of files on a ceph fs with snapshots. After the snapshots were
rotated out, all but one of these warnings disappeared over time.
However, one warning is stuck and I wonder if its something else.

Is there a reasonable way (say, one-liner with no more than 120
characters) to get ceph to tell me which PG this is coming from? I
just want to issue a deep scrub to check if it disappears and going
through the logs and querying every single object for its key count
seems a bit of a hassle for something that ought to be part of "ceph
health detail".

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux