recovery_unfound during scrub with auto repair = true

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Sun, 13 Jun 2021 08:58:13 +0200

Hi all,

The cluster here is running v14.2.20 and is used for RBD images.

We have a PG in recovery_unfound state and since this is the first
time we've had this occur, we wanted to get your advice on the best
course of action.

PG 4.1904 went into state active+recovery_unfound+degraded+repair [1]
during normal scrubbing (but note that we have `osd scrub auto repair
= true`).

2021-06-13 03:15:11.559680 osd.951 (osd.951) 138 : cluster [DBG]
4.1904 repair starts
2021-06-13 04:00:49.369256 osd.951 (osd.951) 139 : cluster [ERR]
4.1904 shard 951 soid
4:209cfddb:::rbd_data.3a4ff12d847b61.000000000001c39e:head : candidate
had a read error

The scrub detected a read error on the primary of this PG, and tried
to repair it by reading from the other 2 osds:

Jun 13 04:00:46 xxx kernel: sd 0:0:25:0: [sdp] tag#6 FAILED Result:
hostbyte=DID_OK driverbyte=DR
Jun 13 04:00:46 xxx kernel: sd 0:0:25:0: [sdp] tag#6 Sense Key :
Medium Error [current] [descript
Jun 13 04:00:46 xxx kernel: sd 0:0:25:0: [sdp] tag#6 Add. Sense:
Unrecovered read error
Jun 13 04:00:46 xxx kernel: sd 0:0:25:0: [sdp] tag#6 CDB: Read(16) 88
00 00 00 00 02 ba 8c 0b 00
Jun 13 04:00:46 xxx kernel: blk_update_request: critical medium error,
dev sdp, sector 1171967531

But it seems that the other 2 osds could not repair this failed read
on the primary because they don't have the correct version of the
object:

2021-06-13 04:28:29.412765 osd.951 (osd.951) 140 : cluster [ERR]
4.1904 repair 0 missing, 1 inconsistent objects
2021-06-13 04:28:29.413320 osd.951 (osd.951) 141 : cluster [ERR]
4.1904 repair 1 errors, 1 fixed
2021-06-13 04:28:29.445659 osd.14 (osd.14) 414 : cluster [ERR] 4.1904
push 4:209cfddb:::rbd_data.3a4ff12d847b61.000000000001c39e:head v
3592634'367863320 failed because local copy is 3593555'368312656
2021-06-13 04:28:29.472554 osd.344 (osd.344) 124 : cluster [ERR]
4.1904 push 4:209cfddb:::rbd_data.3a4ff12d847b61.000000000001c39e:head
v 3592634'367863320 failed because local copy is 3593555'368312656
2021-06-13 04:28:30.863807 mgr.yyy (mgr.692832499) 648287 : cluster
[DBG] pgmap v557097: 19456 pgs: 1
active+recovery_unfound+degraded+repair, 2 active+clean+scrubbing,
19423 active+clean, 30 active+clean+scrubbing+deep+repair; 1.3 PiB
data, 4.0 PiB used, 2.1 PiB / 6.1 PiB avail; 350 MiB/s rd, 766 MiB/s
wr, 16.93k op/s; 3/1063641423 objects degraded (0.000%); 1/354547141
objects unfound (0.000%)

I don't understand how the versions of the objects would get out of
sync -- there have been no other recent failures on these disks,
AFAICT.
So my best guess is that the IO error on 951 confused the repair
process -- the osd.951 tried to recover the non-latest version of the
object.
(This would imply that the object versions on osds 14 and 344 are in
fact the correct newest versions).

We have a few ideas how to fix this:

* osd 951 is sick, so drain it by setting `ceph osd primary-affinity
951 0` and `ceph osd out 951`
* osd 951 is really sick, so just stop it now and backfill its PGs to
other OSDs.
* Don't stop osd 951 yet: Restart all three relevant OSDs and see if
that fixes the object versions.
* Don't drain osd 951 yet: Make OSD 14 or 344 the primary for this PG,
(e.g. ceph osd primary-affinity 951 0) then run `ceph pg repair
4.1904` so that the version from osds 14/344 can be pushed.
* Use mark_unfound_lost revert, or delete. (and inform the user their
image to fsck their image).

Does anyone have some recent experience or advice on this issue?

Best Regards,

Dan

[1]
# ceph pg 4.1904 query
{
    "state": "active+recovery_unfound+degraded+repair",
    "snap_trimq": "[1c7fd~1,1c7ff~1,1c801~1,1c803~1,1c805~1]",
    "snap_trimq_len": 5,
    "epoch": 3593586,
    "up": [
        951,
        344,
        14
    ],
    "acting": [
        951,
        344,
        14
    ],
    "acting_recovery_backfill": [
        "14",
        "344",
        "951"
    ],

...
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx