Re: pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eugen, thank you for the reply.

The OSD was drained over the weekend, so OSD 223 and 269 have only the problematic PG 404.bc.

I don't think moving the PG would help since I don't have any empty OSD to move it to, and a move would not fix the hash mismatch. The reason I just want to have the problematic PG on the OSDs is to reduce recovery time. I would need to set min_size to 4 in an EC 4+2, and stop them both at the same time to force a rebuild of the corrupted part of PG that is on osd 223 and 269, since repair doesn't fix it.

I'm debating with myself if I should
1. Stop both OSD 223 and 269,
2. Just one of them.

Stopping them both, I'm guarantied that part of the PG on 223 and 269 is rebuild from the 4 other, 297, 276, 136 and 197 that doesn't have any errors.

OSD 223 is the master in the EC, pg 404.bc acting [223,297,269,276,136,197] So maybe just stop that one, wait for recovery and the run deep-scrub to check if things look better.
But would it then use corrupted data on osd 269 to rebuild.


-
Kai Stian Olstad



On 26.02.2024 10:19, Eugen Block wrote:
Hi,

I think your approach makes sense. But I'm wondering if moving only the problematic PGs to different OSDs could have an effect as well. I assume that moving the 2 PGs is much quicker than moving all BUT those 2 PGs. If that doesn't work you could still fall back to draining the entire OSDs (except for the problematic PG).

Regards,
Eugen

Zitat von Kai Stian Olstad <ceph+list@xxxxxxxxxx>:

Hi,

No one have any comment at all?
I'm not picky so any speculation, guessing, I would, I wouldn't, should work and so one would be highly appreciated.


Since 4 out of 6 in EC 4+2 is OK and ceph pg repair doesn't solve it I think the following might work.

pg 404.bc acting [223,297,269,276,136,197]

- Use pgremapper to move all PG on OSD 223 and 269 except 404.bc to other OSD. - Set min_since to 4, ceph osd pool set default.rgw.buckets.data min_size 4
- Stop osd 223 and 269

What I hope will happen is that Ceph then recreate 404.bc shard s0(osd.223) and s2(osd.269) since they are now down from the remaining shards
s1(osd.297), s3(osd.276), s4(osd.136) and s5(osd.197)


_Any_ comment is highly appreciated.

-
Kai Stian Olstad


On 21.02.2024 13:27, Kai Stian Olstad wrote:
Hi,

Short summary

PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 objects. Ceph pg repair doesn't fix it, because if you run deep-srub on the PG after repair is finished, it still report scrub errors.

Why can't ceph pg repair repair this, it has 4 out of 6 should be able to reconstruct the corrupted shards? Is there a way to fix this? Like delete object s0 and s2 so it's forced to recreate them?


Long detailed summary

A short backstory.
* This is aftermath of problems with mclock, post "17.2.7: Backfilling deadlock / stall / stuck / standstill" [1].
 - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
 - Solution was to swap from mclock to wpq and restart alle OSD.
 - When all backfilling was finished all 4 OSD was replaced.
 - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.


PG / pool 404 is EC 4+2 default.rgw.buckets.data

9 days after the osd.223 og osd.269 was replaced, deep-scub was run and reported errors
   ceph status
   -----------
HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg inconsistent
   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is active+clean+inconsistent, acting [223,297,269,276,136,197]

I then run repair
   ceph pg repair 404.bc

And ceph status showed this
   ceph status
   -----------
   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
       osd.223 had 698 reads repaired
       osd.269 had 698 reads repaired

But osd.223 and osd.269 is new disks and the disks has no SMART error or any I/O error in OS logs.
So I tried to run deep-scrub again on the PG.
   ceph pg deep-scrub 404.bc

And got this result.

   ceph status
   -----------
HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs; Possible data damage: 1 pg inconsistent
   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
       osd.223 had 698 reads repaired
       osd.269 had 698 reads repaired
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is active+clean+scrubbing+deep+inconsistent+repair, acting [223,297,269,276,136,197]

698 + 698 = 1396 so the same amount of errors.

Run repair again on 404.bc and ceph status is

   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
       osd.223 had 1396 reads repaired
       osd.269 had 1396 reads repaired

So even when repair finish it doesn't fix the problem since they reappear again after a deep-scrub.

The log for osd.223 and osd.269 contain "got incorrect hash on read" and "candidate had an ec hash mismatch" for 698 unique objects. But i only show the logs for 1 of the 698 object, the log is the same for the other 697 objects.

osd.223 log (only showing 1 of 698 object named 2021-11-08T19%3a43%3a50,145489260+00%3a00)
   -----------
Feb 20 10:31:00 ceph-hd-003 ceph-osd[3665432]: osd.223 pg_epoch: 231235 pg[404.bcs0( v 231235'1636919 (231078'1632435,231235'1636919] local-lis/les=226263/226264 n=296580 ec=36041/27862 lis/c=226263/226263 les/c/f=226264/230954/0 sis=226263) [223,297,269,276,136,197]p223(0) r=0 lpr=226263 crt=231235'1636919 lcod 231235'1636918 mlcod 231235'1636918 active+clean+scrubbing+deep+inconsistent+repair [ 404.bcs0: REQ_SCRUB ] MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB planned REQ_SCRUB] _scan_list 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head got incorrect hash on read 0xc5d1dd1b != expected 0x7c2f86d7 Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]: log_channel(cluster) log [ERR] : 404.bc shard 223(0) soid 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash mismatch Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]: log_channel(cluster) log [ERR] : 404.bc shard 269(2) soid 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash mismatch Feb 20 10:31:01 ceph-hd-003 ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]: 2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster) log [ERR] : 404.bc shard 223(0) soid 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash mismatch Feb 20 10:31:01 ceph-hd-003 ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]: 2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster) log [ERR] : 404.bc shard 269(2) soid 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash mismatch

osd.269 log (only showing 1 of 698 object named 2021-11-08T19%3a43%3a50,145489260+00%3a00)
   -----------
Feb 20 10:31:00 ceph-hd-001 ceph-osd[3656897]: osd.269 pg_epoch: 231235 pg[404.bcs2( v 231235'1636919 (231078'1632435,231235'1636919] local-lis/les=226263/226264 n=296580 ec=36041/27862 lis/c=226263/226263 les/c/f=226264/230954/0 sis=226263) [223,297,269,276,136,197]p223(0) r=2 lpr=226263 luod=0'0 crt=231235'1636919 mlcod 231235'1636919 active mbc={}] _scan_list 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head got incorrect hash on read 0x7c0871dc != expected 0xcf6f4c58

The log for the other osd in the PG osd.297, osd.276, osd.136 and osd.197 doesn't show any error.

If I try to get the object it failes
   $ s3cmd s3://benchfiles/2021-11-08T19:43:50,145489260+00:00
download: 's3://benchfiles/2021-11-08T19:43:50,145489260+00:00' -> './2021-11-08T19:43:50,145489260+00:00' [1 of 1] ERROR: Download of './2021-11-08T19:43:50,145489260+00:00' failed (Reason: 500 (UnknownError))
   ERROR: S3 error: 500 (UnknownError)

And the RGW log show this
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new request req=0x7f94b744d660 ===== Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: WARNING: set_req_state_err err_no=5 resorting to 500 Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new request req=0x7f94b6e41660 ===== Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== req done req=0x7f94b744d660 op status=-5 http_status=500 latency=0.020000568s ====== Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: beast: 0x7f94b744d660: 110.2.0.46 - test1 [21/Feb/2024:08:27:06.021 +0000] "GET /benchfiles/2021-11-08T19%3A43%3A50%2C145489260%2B00%3A00 HTTP/1.1" 500 226 - - - latency=0.020000568s

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/IPHBE3DLW5ABCZHSNYOBUBSI3TLWVD22/#OE3QXLAJIY6NU7PNMGHP47UK2CBZJPUG

--
Kai Stian Olstad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux