I posted the same question to the list last week and never got a reply. In addition, I'd also like to know if there's a difference in failure behavior between XFS backed Ceph (writeahead journaling) and BTRFS backed Ceph (parallel journaling). Calvin On Fri, May 18, 2012 at 12:30 PM, Guido Winkelmann <guido-ceph@xxxxxxxxxxxxxxxxx> wrote: > Hi, > > We have been having a lot of discussions at my workplace about whether to > employ a Ceph cluster in production or not, and if yes, how to set up the > hardware for it. During that discussion, I mentioned that, according to the > documentation, we should see significant speedups from using dedicated SSDs > for the OSD's journals. Unfortunately, my colleagues did not like this idea at > all - many of them had bad experiences with SSDs failing or at least read a > lot about that on the Internet, and there's a general consensus that SSDs are > just not quite reliable enough yet for production servers. > > This leads me to the question: What exactly can happen if an OSD's journal > device suddenly fails during operations? Can that lead to data loss or > corruption, or disruptions of the service? > > In my experience with the small three-machine test cluster I have here, a > single failed node usually would lead to a pretty severe outage of the entire > cluster on the order of ten minutes or more (probably much more when it's a > really big node that fails), though so far no data loss or corruption... > > Regards, > > Guido > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html