On 06/01/2015 03:41 AM, Jan Schermer wrote:
Thanks, that’s it exactly.
But I think that’s really too much work for now, that’s why I really would like to see a quick-win by using the local RBD cache for now - that would suffice for most workloads (not too many people run big databases on CEPH now, those who do must be aware of this).
The issue is - and I have not yet seen an answer to that - would it be safe as it is now if the flushes were ignored (rbd cache = unsafe) or will it completely b0rk the filesystem when not flushed properly?
Generally the latter. Right now flushes are the only thing enforcing
ordering for rbd. As a block device it doesn't guarantee that e.g. the
extent at offset 0 is written before the extent at offset 4096 unless
it sees a flush between the writes.
As suggested earlier in this thread, maintaining order during writeback
would make not sending flushes (via mount -o nobarrier in the guest or
cache=unsafe for qemu) safer from a crash-consistency point of view.
An fs or database on top of rbd would still have to replay their
internal journal, and could lose some writes, but should be able to
end up in a consistent state that way. This would make larger caches
more useful, and would be a simple way to use a large local cache
devices as an rbd cache backend. Live migration should still work in
such a system because qemu will still tell rbd to flush data at that
point.
A distributed local cache like [1] might be better long term, but
much more complicated to implement.
Josh
[1]
https://www.usenix.org/conference/fast15/technical-sessions/presentation/bhagwat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com