Re: RBD/OSD questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



--- On Thu, 5/6/10, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Thu, 6 May 2010, Cláudio Martins
> wrote:
> > On Thu, 6 May 2010 14:02:40 -0700 (PDT) Martin Fick
> <mogulguy@xxxxxxxxx>
> wrote:
> > > To put this in the perspective of OSD setups, if you 
> > > already have stripping, using the replicas also may 
> > > not make much of a difference, but I wonder how a two
> > > node OSD setup with double redundancy would fair?  
> > > With such a setup there will not really be any 
> > > stripping will there?  With such a setup (one that I 
> > > can easily see being popular for simple/minimal RBD
> > > redundancy setups), perhaps replica "stripping"
> > > would help.  A 'smart' RBD could detect non
> > > contiguous reads and spread the reads out in that
> > > case.
> > 
> >  Unless I understood wrongly the Ceph papers, the
> > current situation is not that bad.
> > 
> >  IIRC, a big file will be stripped over many
> > different objects. Each object ID will map to 
> > its own primary replica, which will be vary from
> > object to object. Thus, given many clients reading
> > different chunks of that file, even 2 OSDs should
> > see a fairly equal amount of traffic. The same 
> > should be true for small files. Unless you have
> > lots of clients all reading the same file.
> 
> Yeah, you've got it right.  The rbd image is striped
> over small objects, which are independently assigned 
> to OSDs.  The load should be very well distributed.


How can that be on a 2 OSD setup with double redundancy?
In this case, if all of a replicas smaller objects are
not on a single node, how will it recover from an OSD 
failure?  

The only way I see this possible is if file foo is 
split into small objects A1 A2 A3 A4 and replicas B1 
B2 B3 B4 and you spread those across 2 OSDs like this:

replica 1 (A1 B2 A3 B4)
replica 2 (B1 A2 B3 A4)

but then A1 has to know that it is the same as B1.  Is
that the case?  If so, cool, that would mean that 
redundancy would already be providing some stripping
and thus, it would indeed seem harder to find a case
where more stripping/fanout is needed.

Ciao,

-Martin



      
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux