Re: PGS INCONSISTENT - read_error - replace disk or pg repair then replace disk

Massimo Sgaravatto <massimo.sgaravatto@xxxxxxxxx> · Sat, 23 May 2020 12:17:41 +0200

When I see this problem usually:

- I run pg repair
- I remove the OSD from the cluster
- I replace the disk
- I recreate the OSD on the new disk

Cheers, Massimo

On Wed, May 20, 2020 at 9:41 PM Peter Lewis <plewis@xxxxxxxxxxxxxx> wrote:

> Hello,
>
> I  came across a section of the documentation that I don't quite
> understand.  In the section about inconsistent PGs it says if one of the
> shards listed in `rados list-inconsistent-obj` has a read_error the disk is
> probably bad.
>
> Quote from documentation:
>
> https://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent
> `If read_error is listed in the errors attribute of a shard, the
> inconsistency is likely due to disk errors. You might want to check your
> disk used by that OSD.`
>
> I determined that the disk is bad by looking at the output of smartctl.  I
> would think that replacing the disk by removing the OSD from the cluster
> and allowing the cluster to recover would fix this inconsistency error
> without having to run `ceph pg repair`.
>
> Can I just replace the OSD and the inconsistency will be resolved by the
> recovery?  Or would it be better to run `ceph pg repair` and then replace
> the OSD associated with that bad disk?
>
> Thanks!
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx