Re: Data loss after backfill

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 23 Apr 2021 15:01:59 -0500

Hi!

On Fri, Apr 23, 2021 at 3:53 AM <hase.jin@xxxxxxxxxxx> wrote:
> Hi, I would like to report the serious problem about data loss in customer environment.
>
> An OSD data loss has been propagated to other OSDs. If backfill is performed when shard is missing in a primary OSD, the shard that is corresponding to the shard in a primary OSD is also missing in the OSD to which the backfill is directed.
> In case of 4+2 erasure coding, if copies are occurred against two OSDs during one backfill, three shards are missing(primary + two copies), making data recovery impossible. This data loss depends on setting of erasure coding and the number of copies during backfill.
>
> In fact, I could reproduce this situation. This is the actual data loss, and we need to fix this problem.
> I will verify this with the latest version of ceph, and issue a ticket to redmine later and also report detail information.
> In this mail, I share simple information of environment and procedure to reproduce at first.
>
> Environment:
> - Ceph version: Nautilus
> - Erasure coding: 4+2
> - Type: filestore
>
> Step to Reproduce:
> 1. Setup more than 6 OSDs (with leaving some extra OSD out).
> 2. Store some object to pool.
> 3. Delete a file from a primary OSD in the PG.
>    (In fact, the shard on the primary OSD was unrecognized due to medium error of the primary OSD in the customer environment. To simulate this situation, run `rm`.)
>    e.g.) rm -f /var/lib/ceph/osd/ceph-7/current/1.0s0_head/<some file>.04.21.09\:55\:*

This could be causing two different simulated "failures":

1. A listing of objects during recovery doesn't find the object, and
thus doesn't recover it.  The first case is really a problem with
filestore.  If the medium error you got led to an incomplete readdir()
result from XFS, then Ceph doesn't try to cope with that.
2. The object is in the pg log and recovery tries to read it, but gets
ENOENT, and backfill (silently?) skips it.  This would be a real
problem that affects bluestore as well.  An error in this case should
either recover from remaining shards (if possible) or log an 'unfound'
object.

Do you know which of these was triggered by your medium error?

sage
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx