On Mon, Oct 19, 2015 at 7:09 PM, John Wilkins <jowilkin@xxxxxxxxxx> wrote: > The classic case is when you are just trying Ceph out on a laptop (e.g., > using file directories for OSDs, setting the replica size to 2, and setting > osd_crush_chooseleaf_type to 0). Sure, but the text isn’t really applicable in that situation, is it? It’s specifically calling out the SSD as a single point of failure when it’s being used to journal multiple OSDs, like that’s an important consideration in determining the minimum failure domain. Which, for single-node testing, the minimum failure domain ship has already pretty much sailed, and on any non-single-node deployment, testing or otherwise, a node is already realistically already the minimum failure domain. (And isn’t it the default anyway?) Likewise, if you’re doing a single-node test with a bunch of OSDs on one drive, that drive is already a shared failure component, whether or not journalling is being done to a separate SSD. > The statement is a guideline. You could, in fact, create a CRUSH hierachy > consisting of OSD/journal groups within a host too. However, capturing the > host as a failure domain is preferred if you need to power down the host to > change a drive (assuming it’s not hot-swappable). The particular example given is of a single SSD for the entire node. Inside a given host/node, there are all sorts of single points of failure. > There are cases with high density systems where you have multiple nodes in > the same chassis. So you might opt for a higher minimum failure domain in a > case like that. Sure, my question was a bit unclear in that regard. There are plenty of cases where you the minimum failure domain might be *larger* than a node (and you identified several good ones). Mainly I meant to ask under what circumstances the minimum failure domain might be *smaller* than a node. The only valid answer to that appears to be “testing.” In light of that, perhaps the text as written is emphasizing on minimum failure domain unnecessarily, applicable as that is only to testing, and only to a very specific hardware configuration that (probably) isn’t very common in testing. (And, when it is, the realities of the testing environment where it can come up essentially require going against the advice given anyway.) Perhaps the text would be of more benefit to a larger group of readers if that callout reflected instead the other practicial considerations of packing multiple journals on one SSD: namely that your cluster must be designed to withstand the simultaneous failure of all OSDs that journal to that device, both in terms of excess capacity and rebalancing throughput. Thanks! _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com