Re: Reliable OSD

Nick Fisk <nick@xxxxxxxxxx> · Tue, 17 Mar 2015 17:58:29 -0000

Hi Don,

In a similar situation at the moment. Initially thought EC pools would be ok
for our workload and I still believe they are, however the current cache
tiering code seems to hamper the performance as for every read and write the
whole object has to be de/promoted. This has a very severe performance
impact, especially with small IO's if a large number fall outside of the
cache. During testing I found decreasing the object size improved
performance somewhat as less data was needed to be moved between the tiers
and the cache was more effective due to better granular selection of hot
blocks. See my post a few weeks back for more details.

I was also sceptical that the cache tier was not allowing concurrent
promotions/demotions or if something was blocking somewhere, as I didn't
really see an increase in performance with higher queue depths.

Currently I have had to drop the idea of using EC pools and am using a 3 way
replicated pool. Whilst this means I have a lot less available storage, it
should be enough for the mean time. I'm hoping in future releases
performance will improve, Hammer will introduce proxy reads which will mean
that blocks won't have to be promoted on every read. This could in theory
halve the IO requirements.

Other ideas are to use something like flashcache or enhanceio to cache the
RBD device itself. Depending on if the RBD has shared access amongst
multiple clients, this can be rather simple or very difficult.

Nick

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Don
Doerner
Sent: 17 March 2015 17:08
To: Ceph-users
Subject:  Reliable OSD

Situation: I need to use EC pools (for the economics/power/cooling) for the
storage of data, but my use case requires a block device.  Ergo, I require a
cache tier.  I have tried using a 3x replicated pool as a cache tier - the
throughput was poor, mostly due to latency, mostly due to device saturation
(i.e., of the tier devices), mostly due to seeking.

Data on the cache tier is tactical: it's going to get pushed off the cache
tier into the EC pool relatively quickly.  RAID-6 protection (which is
roughly the same as I get with a 3x replicated pool) is fine.

I happen to have the ability to create a small RAID-6 (on each of several
nodes) that could collectively serve as a cache tier.  And the RAID
controller has a battery, so can operate write-back, so latency goes way
down.  Can I create a pool of unreplicated OSDs, i.e., can I set the size of
the pool to 1?  It seems like this creates a singularity when it comes to
CRUSH: do placement groups even make sense?  Or is there any way that I can
use my RAID hardware to build a low-latency cache tier?

Regards,

-don-

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com