On Mon, Apr 26, 2021 at 4:04 AM <hase.jin@xxxxxxxxxxx> wrote: > > Hi Sage, > > > If the medium error you got led to an incomplete readdir() > result from XFS, then Ceph doesn't try to cope with that. > > Do you mean this behavior is in Ceph specifications? > It is a problem that data loss actually occurs, so I think we need to solve that. I would frame it like this: - With FileStore, Ceph assumed that XFS would return results we could trust (i.e., it would not silently skip files). Trusting XFS turned out to be a bad idea, and not just because of readdir--we also couldn't trust that any data returned by XFS was correct since XFS does not do any sort of data checksums. - We replaced FileStore with BlueStore, which checksums both metadata and data, solving this entire class of problems. The "fix" in this case is to replace your FileStore OSDs with BlueStore. This particular backfill corner case is just one of many bad things that can happen with FileStore and media errors. sage _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx