Why does ceph need a filesystem (was Simulating Disk Failure)

James Harper <james.harper@xxxxxxxxxxxxxxxx> · Sat, 15 Jun 2013 03:07:06 +0000

> 
> Yeah. You've picked up on some warty bits of Ceph's error handling here for
> sure, but it's exacerbated by the fact that you're not simulating what you
> think. In a real disk error situation the filesystem would be returning EIO or
> something, but here it's returning ENOENT. Since the OSD is authoritative for
> that key space and the filesystem says there is no such object, presto! It
> doesn't exist.
> If you restart the OSD it does a scan of the PGs on-disk as well as what it
> should have, and can pick up on the data not being there and recover. But
> "correctly" handling data that has been (from the local FS' perspective)
> properly deleted under a running process would require huge and expensive
> contortions on the part of the daemon (in any distributed system that I can
> think of).
> -Greg
> 

Why was the decision made for ceph to require an underlying filesystem, rather than direct access to disk (like drbd does)?

All of my recent disk failures have been unrecoverable read errors (pending sector in SMART stats), which are easy enough to repair in the short term just by rewriting with a known good copy of the data (assuming that there isn't some other underlying cause and this was just a power-off-at-the-wrong-moment error). Unfortunately because of the disconnect between ceph and the LBA this can't be done by ceph.

Just curious...

Thanks

James
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com