Re: pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

No one have any comment at all?
I'm not picky so any speculation, guessing, I would, I wouldn't, should work and so one would be highly appreciated.


Since 4 out of 6 in EC 4+2 is OK and ceph pg repair doesn't solve it I think the following might work.

pg 404.bc acting [223,297,269,276,136,197]

- Use pgremapper to move all PG on OSD 223 and 269 except 404.bc to other OSD. - Set min_since to 4, ceph osd pool set default.rgw.buckets.data min_size 4
- Stop osd 223 and 269

What I hope will happen is that Ceph then recreate 404.bc shard s0(osd.223) and s2(osd.269) since they are now down from the remaining shards
s1(osd.297), s3(osd.276), s4(osd.136) and s5(osd.197)


_Any_ comment is highly appreciated.

-
Kai Stian Olstad


On 21.02.2024 13:27, Kai Stian Olstad wrote:
Hi,

Short summary

PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 objects. Ceph pg repair doesn't fix it, because if you run deep-srub on the PG after repair is finished, it still report scrub errors.

Why can't ceph pg repair repair this, it has 4 out of 6 should be able to reconstruct the corrupted shards? Is there a way to fix this? Like delete object s0 and s2 so it's forced to recreate them?


Long detailed summary

A short backstory.
* This is aftermath of problems with mclock, post "17.2.7: Backfilling deadlock / stall / stuck / standstill" [1].
  - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
  - Solution was to swap from mclock to wpq and restart alle OSD.
  - When all backfilling was finished all 4 OSD was replaced.
  - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.


PG / pool 404 is EC 4+2 default.rgw.buckets.data

9 days after the osd.223 og osd.269 was replaced, deep-scub was run and reported errors
    ceph status
    -----------
HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg inconsistent
    [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
    [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is active+clean+inconsistent, acting [223,297,269,276,136,197]

I then run repair
    ceph pg repair 404.bc

And ceph status showed this
    ceph status
    -----------
    HEALTH_WARN Too many repaired reads on 2 OSDs
    [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
        osd.223 had 698 reads repaired
        osd.269 had 698 reads repaired

But osd.223 and osd.269 is new disks and the disks has no SMART error or any I/O error in OS logs.
So I tried to run deep-scrub again on the PG.
    ceph pg deep-scrub 404.bc

And got this result.

    ceph status
    -----------
HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs; Possible data damage: 1 pg inconsistent
    [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
    [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
        osd.223 had 698 reads repaired
        osd.269 had 698 reads repaired
    [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is active+clean+scrubbing+deep+inconsistent+repair, acting [223,297,269,276,136,197]

698 + 698 = 1396 so the same amount of errors.

Run repair again on 404.bc and ceph status is

    HEALTH_WARN Too many repaired reads on 2 OSDs
    [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
        osd.223 had 1396 reads repaired
        osd.269 had 1396 reads repaired

So even when repair finish it doesn't fix the problem since they reappear again after a deep-scrub.

The log for osd.223 and osd.269 contain "got incorrect hash on read" and "candidate had an ec hash mismatch" for 698 unique objects. But i only show the logs for 1 of the 698 object, the log is the same for the other 697 objects.

osd.223 log (only showing 1 of 698 object named 2021-11-08T19%3a43%3a50,145489260+00%3a00)
    -----------
Feb 20 10:31:00 ceph-hd-003 ceph-osd[3665432]: osd.223 pg_epoch: 231235 pg[404.bcs0( v 231235'1636919 (231078'1632435,231235'1636919] local-lis/les=226263/226264 n=296580 ec=36041/27862 lis/c=226263/226263 les/c/f=226264/230954/0 sis=226263) [223,297,269,276,136,197]p223(0) r=0 lpr=226263 crt=231235'1636919 lcod 231235'1636918 mlcod 231235'1636918 active+clean+scrubbing+deep+inconsistent+repair [ 404.bcs0: REQ_SCRUB ] MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB planned REQ_SCRUB] _scan_list 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head got incorrect hash on read 0xc5d1dd1b != expected 0x7c2f86d7 Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]: log_channel(cluster) log [ERR] : 404.bc shard 223(0) soid 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash mismatch Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]: log_channel(cluster) log [ERR] : 404.bc shard 269(2) soid 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash mismatch Feb 20 10:31:01 ceph-hd-003 ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]: 2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster) log [ERR] : 404.bc shard 223(0) soid 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash mismatch Feb 20 10:31:01 ceph-hd-003 ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]: 2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster) log [ERR] : 404.bc shard 269(2) soid 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash mismatch

osd.269 log (only showing 1 of 698 object named 2021-11-08T19%3a43%3a50,145489260+00%3a00)
    -----------
Feb 20 10:31:00 ceph-hd-001 ceph-osd[3656897]: osd.269 pg_epoch: 231235 pg[404.bcs2( v 231235'1636919 (231078'1632435,231235'1636919] local-lis/les=226263/226264 n=296580 ec=36041/27862 lis/c=226263/226263 les/c/f=226264/230954/0 sis=226263) [223,297,269,276,136,197]p223(0) r=2 lpr=226263 luod=0'0 crt=231235'1636919 mlcod 231235'1636919 active mbc={}] _scan_list 404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head got incorrect hash on read 0x7c0871dc != expected 0xcf6f4c58

The log for the other osd in the PG osd.297, osd.276, osd.136 and osd.197 doesn't show any error.

If I try to get the object it failes
    $ s3cmd s3://benchfiles/2021-11-08T19:43:50,145489260+00:00
download: 's3://benchfiles/2021-11-08T19:43:50,145489260+00:00' -> './2021-11-08T19:43:50,145489260+00:00' [1 of 1] ERROR: Download of './2021-11-08T19:43:50,145489260+00:00' failed (Reason: 500 (UnknownError))
    ERROR: S3 error: 500 (UnknownError)

And the RGW log show this
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new request req=0x7f94b744d660 ===== Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: WARNING: set_req_state_err err_no=5 resorting to 500 Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new request req=0x7f94b6e41660 ===== Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== req done req=0x7f94b744d660 op status=-5 http_status=500 latency=0.020000568s ====== Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: beast: 0x7f94b744d660: 110.2.0.46 - test1 [21/Feb/2024:08:27:06.021 +0000] "GET /benchfiles/2021-11-08T19%3A43%3A50%2C145489260%2B00%3A00 HTTP/1.1" 500 226 - - - latency=0.020000568s

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/IPHBE3DLW5ABCZHSNYOBUBSI3TLWVD22/#OE3QXLAJIY6NU7PNMGHP47UK2CBZJPUG

--
Kai Stian Olstad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux