Re: Ceph cache tier and rbd volumes/SSD primary, HDD replica crush rule!

Nick Fisk <nick@xxxxxxxxxx> · Tue, 12 Jan 2016 15:08:58 -0000

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Mihai Gheorghe
> Sent: 12 January 2016 14:56
> To: Nick Fisk <nick@xxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Ceph cache tier and rbd volumes/SSD primary, HDD
> replica crush rule!
> 
> Thank you very much for the quick answer.
> 
> I supose cache tier works the same way for object storage aswell!?

Yes, exactly the same. The cache is actually at the object layer anyway so it works the same. You can actually pin/unpin objects from the cache as well if you are using it at the object level.

https://github.com/ceph/ceph/pull/6326

> 
> How is a delete of a cinder volume handled. I ask you this because after the
> volume got flushed to the cold storage, i then deleted it from cinder. It got
> deleted from the cache pool aswell but on the HDD pool,when issuing rbd -p
> ls the volumes were gone but the space was still used (probably rados data)
> untill i manually made a flush command on the cache pool (i didn't wait too
> long to see if the space would be cleared in time). It is probably a
> missconfiguration from my end though.

Ah yes, this is one of my pet hates. It's actually slightly worse than what you describe. All the objects have to be promoted into the cache tier to be deleted and then afterwards, flushed, to remove them from the base tier as well. For a large image, this can actually take quite a long time. Hopefully this will be fixed at some point, I don't believe this would be too difficult to fix.

> 
> In you opinion is cache tier ready for production? I have read that bcache
> (flashcache?) is used in favor of cache tier, but is not that simple to setup and
> there are disadvantages there aswell.

See my recent posts about cache tiering, there is a fairly major bug which limits performance if you're working set doesn't fit in the cache. Assuming you are running the patch for this bug and you can live with the deletion problem above.....then yes I would say that its usable in production. I'm planning to enable it on the production pool in my cluster in the next couple of weeks.

> 
> Also is there a problem if i add a cache tier to an already existing pool that has
> data on it? Or should the pool be empty prior to adding the cache tier?

Nope, that should be fine.

> 
> 2016-01-12 16:30 GMT+02:00 Nick Fisk <nick@xxxxxxxxxx>:
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> Of
> > Mihai Gheorghe
> > Sent: 12 January 2016 14:25
> > To: ceph-users@xxxxxxxxxxxxxx
> > Subject:  Ceph cache tier and rbd volumes/SSD primary, HDD
> > replica crush rule!
> >
> > Hello,
> >
> > I have a question about how cache tier works with rbd volumes!?
> >
> > So i created a pool of SSD's for cache and a pool on HDD's for cold storage
> > that acts as backend for cinder volumes. I create a volume in cinder from an
> > image and spawn an instance. The volume is created in the cache pool as
> > expected and it will be flushed to the cold storage after a period of
> inactivity
> > or after the cache pool reaches 40% full as i understand.
> 
> Cache won't be flushed after inactivity the cache agent only works on % full
> (either # of objects or bytes)
> 
> >
> > Now after the volume is flushed to the HDD and i make a read or write
> > request in the guest OS, how does ceph handle it. Does it upload the whole
> > rbd volume from the cold storage to the cache pool or only a chunk of it
> > where the request is made from the guest OS?
> 
> The cache works on hot objects, so particular objects (normally 4MB) of the
> RBD will be promoted/demoted over time depending on access patterns.
> 
> >
> > Also, is the replication in ceph syncronious or async? If i set a crush rule to
> use
> > as primary host the SSD one and for replication the HDD one, would the
> > writes and reads on the SSD;s be slowed down by the replication on the
> > mechanical drive?
> > Would this configuration be viable? (i ask this because i don't have the
> > number of SSD to make a pool of size 3 on them)
> 
> Its sync replication. If you have a very heavy read workload, you can do what
> you suggest and set the SSD OSD to be the primary copy for each PG, writes
> will still be limited to the speed of the spinning disks, but reads will be
> serviced from the SSD's. However there is a risk in degraded scenarios that
> your performance could dramatically drop if more IO is diverted to spinning
> disks.
> 
> >
> > Thank you!

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com