Re: active+recovery_unfound+degraded in Pacific

"Lomayani S. Laizer" <lomlaizer@xxxxxxxxx> · Thu, 29 Apr 2021 05:58:44 +0300

Hello,

Any advice on this. Am stuck because one VM is not working now. Looks there
is a read error in primary osd(15) for this pg. Should i mark osd 15 down
or out? Is there any risk of doing this?

Apr 28 20:22:31 ceph-node3 kernel: [369172.974734] sd 0:2:4:0: [sde]
tag#358 CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
Apr 28 20:22:31 ceph-node3 kernel: [369172.974739] blk_update_request: I/O
error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio
class 0
Apr 28 21:14:11 ceph-node3 kernel: [372273.275801] sd 0:2:4:0: [sde] tag#28
FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
Apr 28 21:14:11 ceph-node3 kernel: [372273.275809] sd 0:2:4:0: [sde] tag#28
CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
Apr 28 21:14:11 ceph-node3 kernel: [372273.275813] blk_update_request: I/O
error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio
class 0

On Thu, Apr 29, 2021 at 12:24 AM Lomayani S. Laizer <lomlaizer@xxxxxxxxx>
wrote:

> Hello,
> Last week I upgraded my production cluster to Pacific. the cluster was
> healthy until a few hours ago.
> When scrub  run 4hrs ago  left the cluster in an inconsistent state. Then
> issued the command ceph pg repair 7.182 to try to repair the cluster but
> ended with active+recovery_unfound+degraded
>
> All OSDs are up and all running bluestore with replication of 3 and
> minimum size of 2. I have restarted all OSD but still not helping.
>
> Any recommendations on how to recover the cluster safely?
>
> I have attached result of ceph pg 7.182 query
>
>  ceph health detail
> HEALTH_ERR 1/2459601 objects unfound (0.000%); Possible data damage: 1 pg
> recovery_unfound; Degraded data redundancy: 3/7045706 objects degraded
> (0.000%), 1 pg degraded
> [WRN] OBJECT_UNFOUND: 1/2459601 objects unfound (0.000%)
>     pg 7.182 has 1 unfound objects
> [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
>     pg 7.182 is active+recovery_unfound+degraded, acting [15,1,11], 1
> unfound
> [WRN] PG_DEGRADED: Degraded data redundancy: 3/7045706 objects degraded
> (0.000%), 1 pg degraded
>     pg 7.182 is active+recovery_unfound+degraded, acting [15,1,11], 1
> unfound
>
>
>
> ceph -w
>   cluster:
>     id:     4b9f6959-fead-4ada-ac58-de5d7b149286
>     health: HEALTH_ERR
>             1/2459586 objects unfound (0.000%)
>             Possible data damage: 1 pg recovery_unfound
>             Degraded data redundancy: 3/7045661 objects degraded (0.000%),
> 1 pg degraded
>
>   services:
>     mon: 3 daemons, quorum mon-a,mon-b,mon-c (age 38m)
>     mgr: mon-a(active, since 38m)
>     osd: 46 osds: 46 up (since 25m), 46 in (since 3w)
>
>   data:
>     pools:   4 pools, 705 pgs
>     objects: 2.46M objects, 9.1 TiB
>     usage:   24 TiB used, 95 TiB / 119 TiB avail
>     pgs:     3/7045661 objects degraded (0.000%)
>              1/2459586 objects unfound (0.000%)
>              701 active+clean
>              3   active+clean+scrubbing+deep
>              1   active+recovery_unfound+degraded
>
> ceph pg 7.182 list_unfound
> {
>     "num_missing": 1,
>     "num_unfound": 1,
>     "objects": [
>         {
>             "oid": {
>                 "oid": "rbd_data.2f18f2a67fad72.000000000002021a",
>                 "key": "",
>                 "snapid": -2,
>                 "hash": 3951004034,
>                 "max": 0,
>                 "pool": 7,
>                 "namespace": ""
>             },
>             "need": "184249'118613008",
>             "have": "0'0",
>             "flags": "none",
>             "clean_regions": "clean_offsets: [], clean_omap: 0,
> new_object: 1",
>             "locations": []
>         }
>     ],
>     "state": "NotRecovering",
>     "available_might_have_unfound": true,
>     "might_have_unfound": [],
>     "more": false
> }
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx