Re: rados semantic changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 9, 2016 at 1:56 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> On Wed, Mar 9, 2016 at 1:47 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> On Wed, 9 Mar 2016, Gregory Farnum wrote:
>>> On Wed, Mar 9, 2016 at 12:42 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>>> > Resurrecting an old thread.
>>> >
>>> > I think we really want to make these semantic changes to current rados
>>> > ops (like delete) to make life better going forward.  Ideally shortly
>>> > after jewel so that they have plenty of time to bake before K and L.
>>> >
>>> > I'm wondering if the way to make this change visible to users is to
>>> > (finally) rev librados to librados3.  We can take the opportunity to make
>>> > any other pending cleanups to the public API as well...
>>>
>>> Yep. I presume you're thinking of this because of
>>> http://tracker.ceph.com/issues/14468? It looks like we didn't really
>>> have any good solutions for that pipelining problem though; any new
>>> suggestions?
>>
>> Yeah, I'm still not very happy with either alternative:
>>
>> 1) We persistently record the reqid and return value in the pg log.  This
>> turns failed rw ops into a replicated (metadata) write, which sort of
>> sucks.  It also means that we probably *wouldn't* store any reply payload,
>> which means we lose the ability to have a failure return useful data
>> (e.g., info about why it failed).
>
> This inability to return data on writes has pretty persistently sucked
> for us... I wonder if we should be attacking it from that direction
> instead. We just don't want pglog entries to get that large and are
> worried about being able to reproduce the data on replay, right?
> Perhaps we could add some kind of limited-size lookaside thing. Given
> that RW ops *are* a write on success (whatever "success" means in the
> op's context) I'm not so concerned about turning them into writes even
> if they would have been a read. The other option is #2, which as you
> note might have some serious performance implications on the client
> side. :/

+1

Both in terms of usefulness for client applications, and in
performance implications. Maybe we can relax the guarantees of the
returned data on writes, so that it doesn't become an issue?

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux