On Thu, 25 Jul 2013, Gregory Farnum wrote: > On Thu, Jul 25, 2013 at 4:28 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > > On Thu, 25 Jul 2013, Gregory Farnum wrote: > >> On Thu, Jul 25, 2013 at 4:01 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > >> > I've added a blueprint for avoiding double-writes when using btrfs: > >> > > >> > http://wiki.ceph.com/01Planning/02Blueprints/Emperor/osd:_clone_from_journal_on_btrfs > >> > > >> > This should improve throughput significantly when the journal is a file in > >> > btrfs. > >> > > >> > --- > >> > > >> > Also, there's one for improving the localized read behavior: > >> > > >> > http://wiki.ceph.com/01Planning/02Blueprints/Emperor/librados%2F%2Fobjecter%3A_smarter_localized_reads > >> > > >> > For example, for read-only parents of rbd clones, we may as well read from > >> > the replica in the same host or rack or row--whatever crush can tell > >> > us--and not the primary. This is good for locality and load distribution > >> > when certain object sets are hot. > >> > >> This blueprint includes work items to set locality information in > >> libcephfs and via the Hadoop bindings. However, there's still a read > >> hole issue with read-from-replicas [1] that makes this generally > >> unwise. Did you consider that when writing this blueprint? > >> In particular I think we want to discuss if we allow people to use a > >> more powerful read-from-replica unless we can guarantee their usage of > >> it is safe (ie, snapshots). > > > > Yeah, there's an open bug for that, but the solution doesn't seem > > interesting enough to warrant a CDS discussion... > > > > http://tracker.ceph.com/issues/5388 > > > > But if I'm wrong, by all means write one! :) > > I didn't think we had a solution yet, since your last words there are > "the fix on the OSD is going to be a bit more involved". :p > That doesn't mean we shouldn't do this, I just thought it was a > problem that needed to be part of the blueprint when designing and > implementing this, whether it's the user's problem to handle properly, > or we want to lock it out in ways we can be reasonably sure are safe, > or if we expect the local read issue to be resolved before this is > completed. I'm assuming it's a matter of using the ObjectContexts on the replicas, but perhaps not. In any case, I'm operating on the assumption that this is a bug that must be resolved before the smarter localized reads are usable, but that the bug isn't interesting enough to discuss. If you disagree, write or update the blueprint :) sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html