Hi, We have been having a lot of discussions at my workplace about whether to employ a Ceph cluster in production or not, and if yes, how to set up the hardware for it. During that discussion, I mentioned that, according to the documentation, we should see significant speedups from using dedicated SSDs for the OSD's journals. Unfortunately, my colleagues did not like this idea at all - many of them had bad experiences with SSDs failing or at least read a lot about that on the Internet, and there's a general consensus that SSDs are just not quite reliable enough yet for production servers. This leads me to the question: What exactly can happen if an OSD's journal device suddenly fails during operations? Can that lead to data loss or corruption, or disruptions of the service? In my experience with the small three-machine test cluster I have here, a single failed node usually would lead to a pretty severe outage of the entire cluster on the order of ten minutes or more (probably much more when it's a really big node that fails), though so far no data loss or corruption... Regards, Guido -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html