A simple improvement would be to merge this: https://github.com/ceph/ceph/pull/25148 Enabling auto_repair currently triggers HEALTH_ERR for perfectly healthy PGs because it sets the repair flag on the PG while deep-scrubbing it. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Mar 6, 2019 at 11:38 PM David Zafman <dzafman@xxxxxxxxxx> wrote: > > > Improvements to auto repair > > ------------------------ > > We should allow auto repair for bluestore pools since it has built in > checksums. Currently, we are limited to erasure coded pools. > > In order to trigger a auto repair when regular scrub detects errors, > any errors should immediately schedule a deep-scrub. > > Add a new pg state flag "failed_repair" when repairs can't fix all > errors. This may be tricky to implement because pg repair ends as a > recovery operation. > > Set failed_repair if primary repair triggered by a client read fails. > > Add a count of number of objects that are repaired to PG stats and OSD > stats. > > > David > >