Re: Sessions and Persistence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 15, 2016 at 3:29 PM, Adam C. Emerson <aemerson@xxxxxxxxxx> wrote:
> On 15/04/2016, Gregory Farnum wrote:
>> So the most common time we really get replay operations is when one of
>> the OSDs crash or a PG's acting set changes for some other reason.
>> Which means these "cached" operation results need to be persisted to
>> disk and then cleaned up, a la the pglog.
>> I don't see anything in these data structures that explains how we do
>> that efficiently, which is the biggest problem and the reason we don't
>> already do reply caching. Am I missing something?
>
> So! I had been considering the usual case of resend to be transient connection
> drop between client and OSD. (An example of why feedback is nice :)

Well, I guess I don't have in-the-field information about the relative
prevalence of these scenarios. But we definitely can't include
features in RADOS that work "as long as you don't have acting set
changes". ;)

>
> I /had/ thought of persisting thee things as a possible feature we would want to
> add that administrators could turn on or off depending on the level of
> reliability they wanted (and if they had some NVRAM on the machine.)
>
> I had not thought specifically about persisting them QUICKLY in the
> spinning disk case. One optimization would be refusing to cache read-only
> ops so we don't have to pay for a disk-write unless we're using a disk
> write. My intuition would suggest a per-OSD op-log that gets written
> and committed when the PGLog entry gets committed, but I admit that's
> just spur of the moment. It needs a bit more design work, but bundling
> it with some of the writes we have to do already seems promising.

This is something I've suggested in the past, but I think it's at the
stage where somebody needs to write code demonstrating it is something
approaching performant. If it is, I don't think anybody opposes the
idea; if it's not, then throughput/IOP regressions are not a tradeoff
Sam/Sage are willing to make for this IIRC (and, though I am more
optimistic than I remember them being about our odds of success, I
suppose I'm not either).
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux