Re: Proposal for improvements to auto repair

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If I may, here's one more idea for improving the bluestore repairs.

In the CERN and RAL environments, inconsistent objects are most often
the result of "weak writes", which I've heard Lars discussing in the
past.
In this case, the read fails during client IO or deep-scrub, and SMART
increments the Pending Sector counter. The only way to know if the
sector is truly bad is to try writing the same sector again.

So, when (auto) repairing a bluestore object, could we first try to
overwrite the object in-place?

-- Dan


On Wed, Mar 6, 2019 at 11:38 PM David Zafman <dzafman@xxxxxxxxxx> wrote:
>
>
> Improvements to auto repair
>
> ------------------------
>
> We should allow auto repair for bluestore pools since it has built in
> checksums.  Currently, we are limited to erasure coded pools.
>
> In order to trigger a auto repair when regular scrub detects errors,
> any errors should immediately schedule a deep-scrub.
>
> Add a new pg state flag "failed_repair" when repairs can't fix all
> errors.  This may be tricky to implement because pg repair ends as a
> recovery operation.
>
> Set failed_repair if primary repair triggered by a client read fails.
>
> Add a count of number of objects that are repaired to PG stats and OSD
> stats.
>
>
> David
>
>



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux