Re: pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch"

Eugen Block <eblock@xxxxxx> · Mon, 26 Feb 2024 09:19:28 +0000

Hi,

I think your approach makes sense. But I'm wondering if moving only  
the problematic PGs to different OSDs could have an effect as well. I  
assume that moving the 2 PGs is much quicker than moving all BUT those  
2 PGs. If that doesn't work you could still fall back to draining the  
entire OSDs (except for the problematic PG).

Regards,
Eugen

Zitat von Kai Stian Olstad <ceph+list@xxxxxxxxxx>:

Hi,

No one have any comment at all?
I'm not picky so any speculation, guessing, I would, I wouldn't,  
should work and so one would be highly appreciated.

Since 4 out of 6 in EC 4+2 is OK and ceph pg repair doesn't solve it  
I think the following might work.

pg 404.bc acting [223,297,269,276,136,197]

- Use pgremapper to move all PG on OSD 223 and 269 except 404.bc to  
other OSD.
- Set min_since to 4, ceph osd pool set default.rgw.buckets.data min_size 4
- Stop osd 223 and 269

What I hope will happen is that Ceph then recreate 404.bc shard  
s0(osd.223) and s2(osd.269) since they are now down from the  
remaining shards
s1(osd.297), s3(osd.276), s4(osd.136) and s5(osd.197)

_Any_ comment is highly appreciated.

-
Kai Stian Olstad

On 21.02.2024 13:27, Kai Stian Olstad wrote:
Hi,

Short summary

PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 objects.
Ceph pg repair doesn't fix it, because if you run deep-srub on the  
PG after repair is finished, it still report scrub errors.

Why can't ceph pg repair repair this, it has 4 out of 6 should be  
able to reconstruct the corrupted shards?
Is there a way to fix this? Like delete object s0 and s2 so it's  
forced to recreate them?

Long detailed summary

A short backstory.
* This is aftermath of problems with mclock, post "17.2.7:  
Backfilling deadlock / stall / stuck / standstill" [1].
 - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
 - Solution was to swap from mclock to wpq and restart alle OSD.
 - When all backfilling was finished all 4 OSD was replaced.
 - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.

PG / pool 404 is EC 4+2 default.rgw.buckets.data

9 days after the osd.223 og osd.269 was replaced, deep-scub was run  
and reported errors
   ceph status
   -----------
   HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg inconsistent
   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
       pg 404.bc is active+clean+inconsistent, acting  
[223,297,269,276,136,197]

I then run repair
   ceph pg repair 404.bc

And ceph status showed this
   ceph status
   -----------
   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
       osd.223 had 698 reads repaired
       osd.269 had 698 reads repaired

But osd.223 and osd.269 is new disks and the disks has no SMART  
error or any I/O error in OS logs.
So I tried to run deep-scrub again on the PG.
   ceph pg deep-scrub 404.bc

And got this result.

   ceph status
   -----------
   HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs;  
Possible data damage: 1 pg inconsistent
   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
       osd.223 had 698 reads repaired
       osd.269 had 698 reads repaired
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
       pg 404.bc is  
active+clean+scrubbing+deep+inconsistent+repair, acting  
[223,297,269,276,136,197]

698 + 698 = 1396 so the same amount of errors.

Run repair again on 404.bc and ceph status is

   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
       osd.223 had 1396 reads repaired
       osd.269 had 1396 reads repaired

So even when repair finish it doesn't fix the problem since they  
reappear again after a deep-scrub.

The log for osd.223 and osd.269 contain "got incorrect hash on  
read" and "candidate had an ec hash mismatch" for 698 unique objects.
But i only show the logs for 1 of the 698 object, the log is the  
same for the other 697 objects.

   osd.223 log (only showing 1 of 698 object named  
2021-11-08T19%3a43%3a50,145489260+00%3a00)
   -----------
   Feb 20 10:31:00 ceph-hd-003 ceph-osd[3665432]: osd.223 pg_epoch:  
231235 pg[404.bcs0( v 231235'1636919  
(231078'1632435,231235'1636919] local-lis/les=226263/226264  
n=296580 ec=36041/27862 lis/c=226263/226263 les/c/f=226264/230954/0  
sis=226263) [223,297,269,276,136,197]p223(0) r=0 lpr=226263  
crt=231235'1636919 lcod 231235'1636918 mlcod 231235'1636918  
active+clean+scrubbing+deep+inconsistent+repair [ 404.bcs0:   
REQ_SCRUB ]  MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB planned  
REQ_SCRUB] _scan_list   
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head got incorrect hash on read 0xc5d1dd1b !=  expected  
0x7c2f86d7
   Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]:  
log_channel(cluster) log [ERR] : 404.bc shard 223(0) soid  
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash  
mismatch
   Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]:  
log_channel(cluster) log [ERR] : 404.bc shard 269(2) soid  
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash  
mismatch
   Feb 20 10:31:01 ceph-hd-003  
ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]:  
2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster)  
log [ERR] : 404.bc shard 223(0) soid  
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash  
mismatch
   Feb 20 10:31:01 ceph-hd-003  
ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]:  
2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster)  
log [ERR] : 404.bc shard 269(2) soid  
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash  
mismatch

   osd.269 log (only showing 1 of 698 object named  
2021-11-08T19%3a43%3a50,145489260+00%3a00)
   -----------
   Feb 20 10:31:00 ceph-hd-001 ceph-osd[3656897]: osd.269 pg_epoch:  
231235 pg[404.bcs2( v 231235'1636919  
(231078'1632435,231235'1636919] local-lis/les=226263/226264  
n=296580 ec=36041/27862 lis/c=226263/226263 les/c/f=226264/230954/0  
sis=226263) [223,297,269,276,136,197]p223(0) r=2 lpr=226263  
luod=0'0 crt=231235'1636919 mlcod 231235'1636919 active mbc={}]  
_scan_list   
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head got incorrect hash on read 0x7c0871dc !=  expected  
0xcf6f4c58

The log for the other osd in the PG osd.297, osd.276, osd.136 and  
osd.197 doesn't show any error.

If I try to get the object it failes
   $ s3cmd s3://benchfiles/2021-11-08T19:43:50,145489260+00:00
   download: 's3://benchfiles/2021-11-08T19:43:50,145489260+00:00'  
-> './2021-11-08T19:43:50,145489260+00:00'  [1 of 1]
   ERROR: Download of './2021-11-08T19:43:50,145489260+00:00'  
failed (Reason: 500 (UnknownError))
   ERROR: S3 error: 500 (UnknownError)

And the RGW log show this
   Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new  
request req=0x7f94b744d660 =====
   Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: WARNING:  
set_req_state_err err_no=5 resorting to 500
   Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new  
request req=0x7f94b6e41660 =====
   Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== req done  
req=0x7f94b744d660 op status=-5 http_status=500  
latency=0.020000568s ======
   Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: beast: 0x7f94b744d660:  
110.2.0.46 - test1 [21/Feb/2024:08:27:06.021 +0000] "GET  
/benchfiles/2021-11-08T19%3A43%3A50%2C145489260%2B00%3A00 HTTP/1.1"  
500 226 - - - latency=0.020000568s

[1]  
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/IPHBE3DLW5ABCZHSNYOBUBSI3TLWVD22/#OE3QXLAJIY6NU7PNMGHP47UK2CBZJPUG

--
Kai Stian Olstad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx