2010/5/6 Martin Fick <mogulguy@xxxxxxxxx>: >> Yeah, you've got it right. The rbd image is striped >> over small objects, which are independently assigned >> to OSDs. The load should be very well distributed. > > How can that be on a 2 OSD setup with double redundancy? > In this case, if all of a replicas smaller objects are > not on a single node, how will it recover from an OSD > failure? > > The only way I see this possible is if file foo is > split into small objects A1 A2 A3 A4 and replicas B1 > B2 B3 B4 and you spread those across 2 OSDs like this: > > replica 1 (A1 B2 A3 B4) > replica 2 (B1 A2 B3 A4) > > but then A1 has to know that it is the same as B1. Is > that the case? The hashing probably isn't quite even enough to alternate the objects, but yes -- different objects (even those forming a single "file") will have different primary replicas even in a small system. Since the default RBD unit is 4MB in size, and the disk is presumably several to hundreds of gigabytes, you've got a reasonably well-striped system. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html