On Fri, 29 Jun 2012, Brian Edmonds wrote: > On Fri, Jun 29, 2012 at 2:11 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > > Well, actually this depends on the filesystem you're using. With > > btrfs, the OSD will roll back to a consistent state, but you don't > > know how out-of-date that state is. > > Ok, so assuming btrfs, then a single machine failure with a ramdisk > journal should not result in any data loss, assuming replication is > working? The cluster would then be at risk of data loss primarily > from a full power outage. (In practice I'd expect either one machine > to die, or a power loss to take out all of them, and smaller but > non-unitary losses would be uncommon.) Right. From a data-safety perspective ("the cluster said my writes were safe.. are they?") consider journal loss an OSD failure. If there aren't other surviving replicas, something may be lost. >From a recovery perspective, it is a partial failure; not everything was lost, and recovery will be quick (only recent objects get copied around). Maybe your application can tolerate that, maybe it can't. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html