On Thu, Jul 25, 2013 at 4:01 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > I've added a blueprint for avoiding double-writes when using btrfs: > > http://wiki.ceph.com/01Planning/02Blueprints/Emperor/osd:_clone_from_journal_on_btrfs > > This should improve throughput significantly when the journal is a file in > btrfs. > > --- > > Also, there's one for improving the localized read behavior: > > http://wiki.ceph.com/01Planning/02Blueprints/Emperor/librados%2F%2Fobjecter%3A_smarter_localized_reads > > For example, for read-only parents of rbd clones, we may as well read from > the replica in the same host or rack or row--whatever crush can tell > us--and not the primary. This is good for locality and load distribution > when certain object sets are hot. This blueprint includes work items to set locality information in libcephfs and via the Hadoop bindings. However, there's still a read hole issue with read-from-replicas [1] that makes this generally unwise. Did you consider that when writing this blueprint? In particular I think we want to discuss if we allow people to use a more powerful read-from-replica unless we can guarantee their usage of it is safe (ie, snapshots). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com [1]: http://tracker.ceph.com/issues/5388 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html