Re: new scrub and repair discussion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 23, 2016 at 11:54 AM, Shinobu Kinjo <shinobu.kj@xxxxxxxxx> wrote:
> On Wed, Nov 11, 2015 at 8:44 PM, kefu chai <tchaikov@xxxxxxxxx> wrote:
>> currently, scrub and repair are pretty primitive. there are several
>> improvements which need to be made:
>>
> [snip]
>> - repair will create a new version so that possibly corrupted copies
>> on down OSDs will get fixed naturally.
>
> If this new feature is executed by end users manually, it may be
> better to implement dry-run mechanism so that the above process could
> be skipped, and end users initialize scrub process with more
> information, and maybe more safely.

to implement a dry-run, we have two possible ways:

1. export the inconsistency detection logic to client, and expose the
full scrub map to client, so user can run the inconsistency detection
algorithm through the updated scrub map.
2. persist the proposed change in osd, and override the object
information with the proposed ones if any when running the
inconsistency detection logic

imho, the first one is more viable. but it is much more complicated
than current design. maybe we can do it after the repair write API is
ready.

>
> Make sense?
>
> Cheers,
> Shinobu
>
>>
>> so librados will offer enough information and facilities, with which a
>> smart librados client/script will be able to fix the inconsistencies
>> found in the scrub.
>>
>> as an example, if we run into a data inconsistency where the 3
>> replicas failed to agree with each other after performing a deep
>> scrub. probably we'd like to have an election to get the auth copy.
>> following pseudo code explains how we will implement this using the
>> new rados APIs for scrub and repair.
>>
>>      # something is not necessarily better than nothing
>>      rados.aio_scrub(pg, completion)
>>      completion.wait_for_complete()
>>      for pool in rados.get_inconsistent_pools():
>>           for pg in rados.get_inconsistent_pgs(pool):
>>                # rados.get_inconsistent_pgs() throws if "epoch" expires
>>
>>                for oid, inconsistent in rados.get_inconsistent_pgs(pg,
>> epoch).items():
>>                     if inconsistent.is_data_digest_mismatch():
>>                          votes = defaultdict(int)
>>                          for osd, shard_info in inconsistent.shards:
>>                               votes[shard_info.object_info.data_digest] += 1
>>                          digest, _ = mavotes, key=operator.itemgetter(1))
>>                          auth_copy = None
>>                          for osd, shard_info in inconsistent.shards.items():
>>                               if shard_info.object_info.data_digest == digest:
>>                                    auth_copy = osd
>>                                    break
>>                          repair_op = librados.ObjectWriteOperation()
>>                          repair_op.repair_pick(auth_copy,
>> inconsistent.ver, epoch)
>>                          rados.aio_operate_scrub(oid, repair_op)
>>
>> this plan was also discussed in the infernalis CDS. see
>> http://tracker.ceph.com/projects/ceph/wiki/Osd_-_Scrub_and_Repair.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Email:
> shinobu@xxxxxxxxx
> shinobu@xxxxxxxxxx



-- 
Regards
Kefu Chai
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux