--- On Thu, 5/6/10, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Thu, 6 May 2010, Cláudio Martins > wrote: > > On Thu, 6 May 2010 14:02:40 -0700 (PDT) Martin Fick > <mogulguy@xxxxxxxxx> > wrote: > > > To put this in the perspective of OSD setups, if you > > > already have stripping, using the replicas also may > > > not make much of a difference, but I wonder how a two > > > node OSD setup with double redundancy would fair? > > > With such a setup there will not really be any > > > stripping will there? With such a setup (one that I > > > can easily see being popular for simple/minimal RBD > > > redundancy setups), perhaps replica "stripping" > > > would help. A 'smart' RBD could detect non > > > contiguous reads and spread the reads out in that > > > case. > > > > Unless I understood wrongly the Ceph papers, the > > current situation is not that bad. > > > > IIRC, a big file will be stripped over many > > different objects. Each object ID will map to > > its own primary replica, which will be vary from > > object to object. Thus, given many clients reading > > different chunks of that file, even 2 OSDs should > > see a fairly equal amount of traffic. The same > > should be true for small files. Unless you have > > lots of clients all reading the same file. > > Yeah, you've got it right. The rbd image is striped > over small objects, which are independently assigned > to OSDs. The load should be very well distributed. How can that be on a 2 OSD setup with double redundancy? In this case, if all of a replicas smaller objects are not on a single node, how will it recover from an OSD failure? The only way I see this possible is if file foo is split into small objects A1 A2 A3 A4 and replicas B1 B2 B3 B4 and you spread those across 2 OSDs like this: replica 1 (A1 B2 A3 B4) replica 2 (B1 A2 B3 A4) but then A1 has to know that it is the same as B1. Is that the case? If so, cool, that would mean that redundancy would already be providing some stripping and thus, it would indeed seem harder to find a case where more stripping/fanout is needed. Ciao, -Martin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html