On Fri, Apr 15, 2016 at 3:29 PM, Adam C. Emerson <aemerson@xxxxxxxxxx> wrote: > On 15/04/2016, Gregory Farnum wrote: >> So the most common time we really get replay operations is when one of >> the OSDs crash or a PG's acting set changes for some other reason. >> Which means these "cached" operation results need to be persisted to >> disk and then cleaned up, a la the pglog. >> I don't see anything in these data structures that explains how we do >> that efficiently, which is the biggest problem and the reason we don't >> already do reply caching. Am I missing something? > > So! I had been considering the usual case of resend to be transient connection > drop between client and OSD. (An example of why feedback is nice :) Well, I guess I don't have in-the-field information about the relative prevalence of these scenarios. But we definitely can't include features in RADOS that work "as long as you don't have acting set changes". ;) > > I /had/ thought of persisting thee things as a possible feature we would want to > add that administrators could turn on or off depending on the level of > reliability they wanted (and if they had some NVRAM on the machine.) > > I had not thought specifically about persisting them QUICKLY in the > spinning disk case. One optimization would be refusing to cache read-only > ops so we don't have to pay for a disk-write unless we're using a disk > write. My intuition would suggest a per-OSD op-log that gets written > and committed when the PGLog entry gets committed, but I admit that's > just spur of the moment. It needs a bit more design work, but bundling > it with some of the writes we have to do already seems promising. This is something I've suggested in the past, but I think it's at the stage where somebody needs to write code demonstrating it is something approaching performant. If it is, I don't think anybody opposes the idea; if it's not, then throughput/IOP regressions are not a tradeoff Sam/Sage are willing to make for this IIRC (and, though I am more optimistic than I remember them being about our odds of success, I suppose I'm not either). -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html