Re: Error handling during recovery read

David Zafman <dzafman@xxxxxxxxxx> · Fri, 4 Dec 2015 18:24:46 -0800

I can't remember the details now, but I know that recovery needed 
additional work.   If it were a simple fix
I would have done it when implementing that code.

I found this bug related to recovery and ec errors 
(http://tracker.ceph.com/issues/13493)
BUG #13493: osd: for ec, cascading crash during recovery if one shard is 
corrupted

David

On 12/4/15 2:03 AM, Markus Blank-Burian wrote:
Hi David,

I am using ceph 9.2.0 with an erasure coded pool and have some problems with
missing objects.

Reads for degraded/backfilling objects on an EC pool, which detect an error
(-2 in my case) seem to be aborted immediately instead of reading from the
remaining shards. Why is there an explicit check for "!rop.for_recovery" in
ECBackend::handle_sub_read_reply? Would it be possible to remove this check
and let the recovery read be completed from the remaining good shards?

Markus

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html