Hello, Any advice on this. Am stuck because one VM is not working now. Looks there is a read error in primary osd(15) for this pg. Should i mark osd 15 down or out? Is there any risk of doing this? Apr 28 20:22:31 ceph-node3 kernel: [369172.974734] sd 0:2:4:0: [sde] tag#358 CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00 Apr 28 20:22:31 ceph-node3 kernel: [369172.974739] blk_update_request: I/O error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 0 Apr 28 21:14:11 ceph-node3 kernel: [372273.275801] sd 0:2:4:0: [sde] tag#28 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s Apr 28 21:14:11 ceph-node3 kernel: [372273.275809] sd 0:2:4:0: [sde] tag#28 CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00 Apr 28 21:14:11 ceph-node3 kernel: [372273.275813] blk_update_request: I/O error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 0 On Thu, Apr 29, 2021 at 12:24 AM Lomayani S. Laizer <lomlaizer@xxxxxxxxx> wrote: > Hello, > Last week I upgraded my production cluster to Pacific. the cluster was > healthy until a few hours ago. > When scrub run 4hrs ago left the cluster in an inconsistent state. Then > issued the command ceph pg repair 7.182 to try to repair the cluster but > ended with active+recovery_unfound+degraded > > All OSDs are up and all running bluestore with replication of 3 and > minimum size of 2. I have restarted all OSD but still not helping. > > Any recommendations on how to recover the cluster safely? > > I have attached result of ceph pg 7.182 query > > ceph health detail > HEALTH_ERR 1/2459601 objects unfound (0.000%); Possible data damage: 1 pg > recovery_unfound; Degraded data redundancy: 3/7045706 objects degraded > (0.000%), 1 pg degraded > [WRN] OBJECT_UNFOUND: 1/2459601 objects unfound (0.000%) > pg 7.182 has 1 unfound objects > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound > pg 7.182 is active+recovery_unfound+degraded, acting [15,1,11], 1 > unfound > [WRN] PG_DEGRADED: Degraded data redundancy: 3/7045706 objects degraded > (0.000%), 1 pg degraded > pg 7.182 is active+recovery_unfound+degraded, acting [15,1,11], 1 > unfound > > > > ceph -w > cluster: > id: 4b9f6959-fead-4ada-ac58-de5d7b149286 > health: HEALTH_ERR > 1/2459586 objects unfound (0.000%) > Possible data damage: 1 pg recovery_unfound > Degraded data redundancy: 3/7045661 objects degraded (0.000%), > 1 pg degraded > > services: > mon: 3 daemons, quorum mon-a,mon-b,mon-c (age 38m) > mgr: mon-a(active, since 38m) > osd: 46 osds: 46 up (since 25m), 46 in (since 3w) > > data: > pools: 4 pools, 705 pgs > objects: 2.46M objects, 9.1 TiB > usage: 24 TiB used, 95 TiB / 119 TiB avail > pgs: 3/7045661 objects degraded (0.000%) > 1/2459586 objects unfound (0.000%) > 701 active+clean > 3 active+clean+scrubbing+deep > 1 active+recovery_unfound+degraded > > ceph pg 7.182 list_unfound > { > "num_missing": 1, > "num_unfound": 1, > "objects": [ > { > "oid": { > "oid": "rbd_data.2f18f2a67fad72.000000000002021a", > "key": "", > "snapid": -2, > "hash": 3951004034, > "max": 0, > "pool": 7, > "namespace": "" > }, > "need": "184249'118613008", > "have": "0'0", > "flags": "none", > "clean_regions": "clean_offsets: [], clean_omap: 0, > new_object: 1", > "locations": [] > } > ], > "state": "NotRecovering", > "available_might_have_unfound": true, > "might_have_unfound": [], > "more": false > } > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx