Re: RBD/OSD questions

Cláudio Martins <ctpm@xxxxxxxxxx> · Thu, 6 May 2010 22:24:09 +0100

On Thu, 6 May 2010 14:02:40 -0700 (PDT) Martin Fick <mogulguy@xxxxxxxxx> wrote:
> 
> Hmm, I wonder if using a local FS on top of RBD would
> be such a different use case from ceph that this may
> not be very difficult to produce such a workload with.  
> With a local FS on RBD I would expect massive local 
> kernel level caching.  With this in mind I wonder how
> effective OSD level caching would actually be.
> 
> I am particularly thinking of heavy seeky workloads 
> which perhaps are somewhat already spreadout due to
> stripping.  In other words RAID1 (mirroring) can 
> decrease latencies over a non RAID setup locally even
> though that is not the objective of RAID1, but does
> RAID01 decrease latencies much over RAID0, maybe not?
> That might explain the difficulty in creating such
> a scenario.
> 
> To put this in the perspective of OSD setups, if you 
> already have stripping, using the replicas also may 
> not make much of a difference, but I wonder how a two
> node OSD setup with double redundancy would fair?  
> With such a setup there will not really be any 
> stripping will there?  With such a setup (one that I 
> can easily see being popular for simple/minimal RBD
> redundancy setups), perhaps replica "stripping"
> would help.  A 'smart' RBD could detect non 
> contiguous reads and spread the reads out in that
> case.
> 

 Unless I understood wrongly the Ceph papers, the current situation is
not that bad.

 IIRC, a big file will be stripped over many different objects. Each
object ID will map to its own primary replica, which will be vary from
object to object. Thus, given many clients reading different chunks of
that file, even 2 OSDs should see a fairly equal amount of traffic. The
same should be true for small files. Unless you have lots of clients
all reading the same file.

 Am I getting it wrong?

Best regards.

Cláudio

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html