On Wed, Mar 9, 2016 at 1:47 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Wed, 9 Mar 2016, Gregory Farnum wrote: >> On Wed, Mar 9, 2016 at 12:42 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> > Resurrecting an old thread. >> > >> > I think we really want to make these semantic changes to current rados >> > ops (like delete) to make life better going forward. Ideally shortly >> > after jewel so that they have plenty of time to bake before K and L. >> > >> > I'm wondering if the way to make this change visible to users is to >> > (finally) rev librados to librados3. We can take the opportunity to make >> > any other pending cleanups to the public API as well... >> >> Yep. I presume you're thinking of this because of >> http://tracker.ceph.com/issues/14468? It looks like we didn't really >> have any good solutions for that pipelining problem though; any new >> suggestions? > > Yeah, I'm still not very happy with either alternative: > > 1) We persistently record the reqid and return value in the pg log. This > turns failed rw ops into a replicated (metadata) write, which sort of > sucks. It also means that we probably *wouldn't* store any reply payload, > which means we lose the ability to have a failure return useful data > (e.g., info about why it failed). This inability to return data on writes has pretty persistently sucked for us... I wonder if we should be attacking it from that direction instead. We just don't want pglog entries to get that large and are worried about being able to reproduce the data on replay, right? Perhaps we could add some kind of limited-size lookaside thing. Given that RW ops *are* a write on success (whatever "success" means in the op's context) I'm not so concerned about turning them into writes even if they would have been a read. The other option is #2, which as you note might have some serious performance implications on the client side. :/ -Greg > > 2) The objecter prevents rw ops from being pipelined. This means a hash > table in the objecter so that it transparently blocks subsequent requests > to the same object. Or, -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html