Re: Data loss after backfill

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 26, 2021 at 8:09 AM Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> On Mon, Apr 26, 2021 at 4:04 AM <hase.jin@xxxxxxxxxxx> wrote:
> >
> > Hi Sage,
> >
> > > If the medium error you got led to an incomplete readdir()
> > result from XFS, then Ceph doesn't try to cope with that.
> >
> > Do you mean this behavior is in Ceph specifications?
> > It is a problem that data loss actually occurs, so I think we need to solve that.
>
> I would frame it like this:
>
> - With FileStore, Ceph assumed that XFS would return results we could
> trust (i.e., it would not silently skip files).  Trusting XFS turned
> out to be a bad idea, and not just because of readdir--we also
> couldn't trust that any data returned by XFS was correct since XFS
> does not do any sort of data checksums.
> - We replaced FileStore with BlueStore, which checksums both metadata
> and data, solving this entire class of problems.
>
> The "fix" in this case is to replace your FileStore OSDs with
> BlueStore.  This particular backfill corner case is just one of many
> bad things that can happen with FileStore and media errors.

I agree, bluestore handles such errors in a much better way than
filestore and we have further improvements in the pipeline like
https://trello.com/c/pWbCyYsz/614-bluestore-make-asserts-unique-per-return-value,
which will help distinguish issues with the underlying layer more
easily.

Neha

>
> sage
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux