Re: RBD/OSD questions

Gregory Farnum <gregf@xxxxxxxxxxxxxxx> · Thu, 6 May 2010 14:54:04 -0700



2010/5/6 Martin Fick <mogulguy@xxxxxxxxx>:
>> Yeah, you've got it right.  The rbd image is striped
>> over small objects, which are independently assigned
>> to OSDs.  The load should be very well distributed.
>
> How can that be on a 2 OSD setup with double redundancy?
> In this case, if all of a replicas smaller objects are
> not on a single node, how will it recover from an OSD
> failure?
>
> The only way I see this possible is if file foo is
> split into small objects A1 A2 A3 A4 and replicas B1
> B2 B3 B4 and you spread those across 2 OSDs like this:
>
> replica 1 (A1 B2 A3 B4)
> replica 2 (B1 A2 B3 A4)
>
> but then A1 has to know that it is the same as B1.  Is
> that the case?
The hashing probably isn't quite even enough to alternate the objects,
but yes -- different objects (even those forming a single "file") will
have different primary replicas even in a small system.
Since the default RBD unit is 4MB in size, and the disk is presumably
several to hundreds of gigabytes, you've got a reasonably well-striped
system.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html