Re: Local SSD cache for ceph on each compute node.

Ric Wheeler <rwheeler@xxxxxxxxxx> · Tue, 29 Mar 2016 14:31:30 +0300

On 03/29/2016 01:35 PM, Van Leeuwen, Robert wrote:
If you try to look at the rbd device under dm-cache from another host, of course
any data that was cached on the dm-cache layer will be missing since the
dm-cache device itself is local to the host you wrote the data from originally.
And here it can (and probably will) go horribly wrong.
If you miss the dm-cache device (cache/hypervisor failure) you will probably end up with an inconsistent filesystem.
This is because dm-cache is not a ordered write-back cache afaik.

I think that you are twisting in two unrelated points.

dm-cache does do proper ordering.

If you use it to cache writes and then take it effectively out of the picture
(i.e., never destage that data from cache), you end up with holes in a file system.

Nothing to do with ordering, all to do with having a write back cache enabled
and then chopping the write back cached data out of the picture.
Way back in this thread it was mentioned you would just lose a few seconds of data when you lose the cache device.

My point was that when you lose the cache device you do not just miss x seconds of data but probably lose the whole filesystem.

True.

This is because the cache is not “ordered” and random parts, probably the “hot data” you care about, never made it from the cache device into ceph.

Totally unrelated.

However, if the write cache would would be "flushed in-order" to Ceph you would just lose x seconds of data and, hopefully, not have a corrupted disk.
That could be acceptable for some people. I was just stressing that that isn’t the case.

This in order assumption - speaking as some one who has a long history in kernel 
file and storage - is the wrong assumption.

Don't think of the cache device and RBD as separate devices, once they are 
configured like this, they are the same device from the point of view of the 
file system (or whatever) that runs on top of them.

The cache and its caching policy can vary, but it is perfectly reasonable to 
have data live only in that caching layer pretty much forever. Local disk caches 
can also do this by the way :)

The whole flushing order argument is really not relevant here. I could "flush in 
order" after a minute, a week or a year. If the cache is large enough, you might 
have zero data land on the backing store (even if the destage policy does it as 
you suggest as in order).

That all said, the reason to use a write cache on top of client block device - 
rbd or other - is to improve performance for the client.

Any time we make our failure domain require fully operating two devices (the 
cache device and the original device), we increase the probability of a 
non-recoverable failure.  In effect, the reliability of the storage is at best 
as reliable as the least reliable part of the pair.

If you use dm-cache as a write through cache, this is not a problem (i.e., we
would only be used to cache reads).
Caching reads is fine :)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com