Hi,
I think your approach makes sense. But I'm wondering if moving only
the problematic PGs to different OSDs could have an effect as well. I
assume that moving the 2 PGs is much quicker than moving all BUT those
2 PGs. If that doesn't work you could still fall back to draining the
entire OSDs (except for the problematic PG).
Regards,
Eugen
Zitat von Kai Stian Olstad <ceph+list@xxxxxxxxxx>:
Hi,
No one have any comment at all?
I'm not picky so any speculation, guessing, I would, I wouldn't,
should work and so one would be highly appreciated.
Since 4 out of 6 in EC 4+2 is OK and ceph pg repair doesn't solve it
I think the following might work.
pg 404.bc acting [223,297,269,276,136,197]
- Use pgremapper to move all PG on OSD 223 and 269 except 404.bc to
other OSD.
- Set min_since to 4, ceph osd pool set default.rgw.buckets.data min_size 4
- Stop osd 223 and 269
What I hope will happen is that Ceph then recreate 404.bc shard
s0(osd.223) and s2(osd.269) since they are now down from the
remaining shards
s1(osd.297), s3(osd.276), s4(osd.136) and s5(osd.197)
_Any_ comment is highly appreciated.
-
Kai Stian Olstad
On 21.02.2024 13:27, Kai Stian Olstad wrote:
Hi,
Short summary
PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 objects.
Ceph pg repair doesn't fix it, because if you run deep-srub on the
PG after repair is finished, it still report scrub errors.
Why can't ceph pg repair repair this, it has 4 out of 6 should be
able to reconstruct the corrupted shards?
Is there a way to fix this? Like delete object s0 and s2 so it's
forced to recreate them?
Long detailed summary
A short backstory.
* This is aftermath of problems with mclock, post "17.2.7:
Backfilling deadlock / stall / stuck / standstill" [1].
- 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
- Solution was to swap from mclock to wpq and restart alle OSD.
- When all backfilling was finished all 4 OSD was replaced.
- osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.
PG / pool 404 is EC 4+2 default.rgw.buckets.data
9 days after the osd.223 og osd.269 was replaced, deep-scub was run
and reported errors
ceph status
-----------
HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg inconsistent
[ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is active+clean+inconsistent, acting
[223,297,269,276,136,197]
I then run repair
ceph pg repair 404.bc
And ceph status showed this
ceph status
-----------
HEALTH_WARN Too many repaired reads on 2 OSDs
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
osd.223 had 698 reads repaired
osd.269 had 698 reads repaired
But osd.223 and osd.269 is new disks and the disks has no SMART
error or any I/O error in OS logs.
So I tried to run deep-scrub again on the PG.
ceph pg deep-scrub 404.bc
And got this result.
ceph status
-----------
HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs;
Possible data damage: 1 pg inconsistent
[ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
osd.223 had 698 reads repaired
osd.269 had 698 reads repaired
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is
active+clean+scrubbing+deep+inconsistent+repair, acting
[223,297,269,276,136,197]
698 + 698 = 1396 so the same amount of errors.
Run repair again on 404.bc and ceph status is
HEALTH_WARN Too many repaired reads on 2 OSDs
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
osd.223 had 1396 reads repaired
osd.269 had 1396 reads repaired
So even when repair finish it doesn't fix the problem since they
reappear again after a deep-scrub.
The log for osd.223 and osd.269 contain "got incorrect hash on
read" and "candidate had an ec hash mismatch" for 698 unique objects.
But i only show the logs for 1 of the 698 object, the log is the
same for the other 697 objects.
osd.223 log (only showing 1 of 698 object named
2021-11-08T19%3a43%3a50,145489260+00%3a00)
-----------
Feb 20 10:31:00 ceph-hd-003 ceph-osd[3665432]: osd.223 pg_epoch:
231235 pg[404.bcs0( v 231235'1636919
(231078'1632435,231235'1636919] local-lis/les=226263/226264
n=296580 ec=36041/27862 lis/c=226263/226263 les/c/f=226264/230954/0
sis=226263) [223,297,269,276,136,197]p223(0) r=0 lpr=226263
crt=231235'1636919 lcod 231235'1636918 mlcod 231235'1636918
active+clean+scrubbing+deep+inconsistent+repair [ 404.bcs0:
REQ_SCRUB ] MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB planned
REQ_SCRUB] _scan_list
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head got incorrect hash on read 0xc5d1dd1b != expected
0x7c2f86d7
Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]:
log_channel(cluster) log [ERR] : 404.bc shard 223(0) soid
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash
mismatch
Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]:
log_channel(cluster) log [ERR] : 404.bc shard 269(2) soid
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash
mismatch
Feb 20 10:31:01 ceph-hd-003
ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]:
2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster)
log [ERR] : 404.bc shard 223(0) soid
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash
mismatch
Feb 20 10:31:01 ceph-hd-003
ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]:
2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster)
log [ERR] : 404.bc shard 269(2) soid
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head : candidate had an ec hash
mismatch
osd.269 log (only showing 1 of 698 object named
2021-11-08T19%3a43%3a50,145489260+00%3a00)
-----------
Feb 20 10:31:00 ceph-hd-001 ceph-osd[3656897]: osd.269 pg_epoch:
231235 pg[404.bcs2( v 231235'1636919
(231078'1632435,231235'1636919] local-lis/les=226263/226264
n=296580 ec=36041/27862 lis/c=226263/226263 les/c/f=226264/230954/0
sis=226263) [223,297,269,276,136,197]p223(0) r=2 lpr=226263
luod=0'0 crt=231235'1636919 mlcod 231235'1636919 active mbc={}]
_scan_list
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head got incorrect hash on read 0x7c0871dc != expected
0xcf6f4c58
The log for the other osd in the PG osd.297, osd.276, osd.136 and
osd.197 doesn't show any error.
If I try to get the object it failes
$ s3cmd s3://benchfiles/2021-11-08T19:43:50,145489260+00:00
download: 's3://benchfiles/2021-11-08T19:43:50,145489260+00:00'
-> './2021-11-08T19:43:50,145489260+00:00' [1 of 1]
ERROR: Download of './2021-11-08T19:43:50,145489260+00:00'
failed (Reason: 500 (UnknownError))
ERROR: S3 error: 500 (UnknownError)
And the RGW log show this
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new
request req=0x7f94b744d660 =====
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: WARNING:
set_req_state_err err_no=5 resorting to 500
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new
request req=0x7f94b6e41660 =====
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== req done
req=0x7f94b744d660 op status=-5 http_status=500
latency=0.020000568s ======
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: beast: 0x7f94b744d660:
110.2.0.46 - test1 [21/Feb/2024:08:27:06.021 +0000] "GET
/benchfiles/2021-11-08T19%3A43%3A50%2C145489260%2B00%3A00 HTTP/1.1"
500 226 - - - latency=0.020000568s
[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/IPHBE3DLW5ABCZHSNYOBUBSI3TLWVD22/#OE3QXLAJIY6NU7PNMGHP47UK2CBZJPUG
--
Kai Stian Olstad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx