Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Ceph-Users!

On 22/12/2021 00:38, Stefan Schueffler wrote:
The other Problem, regarding the OSD scrub errors, we have this:

ceph health detail shows „PG_DAMAGED: Possible data damage: x pgs inconsistent.“ Every now and then new pgs get inconsistent. All inconsistent pgs belong to the buckets-index-pool de-dus5.rgw.buckets.index

ceph health detail
pg 136.1 is active+clean+inconsistent, acting [8,3,0]

rados -p de-dus5.rgw.buckets.index list-inconsistent-obj 136.1
No scrub information available for pg 136.1
error 2: (2) No such file or directory

rados list-inconsistent-obj 136.1
No scrub information available for pg 136.1
error 2: (2) No such file or directory

ceph pg deep-scrub 136.1
instructing pg 136.1 on osd.8 to deep-scrub

… until now nothing changed, the list-inconsistent-obj does not show any information (did i miss some cli arguments?)

Ususally, we simply do a
ceph pg repair 136.1
which most of the time silently does whatever it is supposed to do, and the error disappears. Shortly after, it reappears at random, with some other (or the same) pg out of the rgw.buckets.index - pool…

Strange you don't see any actual inconsistent objects ...



1)  For me it's usually looking at which pool actually has inconsistencies via e.g. :

$  for pool in $(rados lspools); do echo "${pool} $(rados list-inconsistent-pg ${pool})"; done

 device_health_metrics []
 .rgw.root []
 zone.rgw.control []
 zone.rgw.meta []
 zone.rgw.log ["5.3","5.5","5.a","5.b","5.10","5.11","5.19","5.1a","5.1d","5.1e"]
 zone.rgw.otp []
 zone.rgw.buckets.index ["7.4","7.5","7.6","7.9","7.b","7.11","7.13","7.14","7.18","7.1e"]
 zone.rgw.buckets.data []
 zone.rgw.buckets.non-ec []

(This is from now) and you can see how only metadata pools are actually affected.


2)  I then simply looped over the pgs with "rados list-inconsistent-obj $pg" and this is the object.name, errors and last_reqid:


 "data_log.14","omap_digest_mismatch","client.4349063.0:12045734"
 "data_log.59","omap_digest_mismatch","client.4364800.0:11773451"
 "data_log.30","omap_digest_mismatch","client.4349063.0:10935030"
 "data_log.42","omap_digest_mismatch","client.4348139.0:112695680"
 "data_log.63","omap_digest_mismatch","client.4348139.0:116876563"
 "data_log.44","omap_digest_mismatch","client.4349063.0:11358410"
 "data_log.11","omap_digest_mismatch","client.4349063.0:10259566"
 "data_log.61","omap_digest_mismatch","client.4349063.0:10259594"
 "data_log.28","omap_digest_mismatch","client.4349063.0:11358396"
 "data_log.39","omap_digest_mismatch","client.4349063.0:11364174"
 "data_log.55","omap_digest_mismatch","client.4349063.0:11358415"
 "data_log.15","omap_digest_mismatch","client.4364800.0:9518143"
 "data_log.27","omap_digest_mismatch","client.4349063.0:11473205"
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.6","omap_digest_mismatch","client.4349063.0:11274164"
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.1","omap_digest_mismatch","client.4349063.0:12168097"
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.10","omap_digest_mismatch","client.4348139.0:112993744"
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2202949.678.0","omap_digest_mismatch","client.4349063.0:10289913"
 ".dir.9cba42a3-dd1c-46d4-bdd2-ef634d12c0a5.56337947.1562","omap_digest_mismatch","client.4364800.0:10934595"
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.9","omap_digest_mismatch","client.4349063.0:10431941"
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.0","omap_digest_mismatch","client.4349063.0:10431932"
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2202949.678.10","omap_digest_mismatch","client.4349063.0:10460106"
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.1163207.114.8","omap_digest_mismatch","client.4349063.0:11696943"
 ".dir.06f9b7c7-6326-4a41-9115-d4d092cf74ce.2217176.214.0","omap_digest_mismatch","client.4349063.0:9845513"
 ".dir.9cba42a3-dd1c-46d4-bdd2-ef634d12c0a5.61963196.333.1","omap_digest_mismatch","client.4364800.0:9593089"


As you can see, it's always some omap data that suffers from inconsistencies.




Regards


Christian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux