On 15/04/2016, Gregory Farnum wrote: > So the most common time we really get replay operations is when one of > the OSDs crash or a PG's acting set changes for some other reason. > Which means these "cached" operation results need to be persisted to > disk and then cleaned up, a la the pglog. > I don't see anything in these data structures that explains how we do > that efficiently, which is the biggest problem and the reason we don't > already do reply caching. Am I missing something? So! I had been considering the usual case of resend to be transient connection drop between client and OSD. (An example of why feedback is nice :) I /had/ thought of persisting thee things as a possible feature we would want to add that administrators could turn on or off depending on the level of reliability they wanted (and if they had some NVRAM on the machine.) I had not thought specifically about persisting them QUICKLY in the spinning disk case. One optimization would be refusing to cache read-only ops so we don't have to pay for a disk-write unless we're using a disk write. My intuition would suggest a per-OSD op-log that gets written and committed when the PGLog entry gets committed, but I admit that's just spur of the moment. It needs a bit more design work, but bundling it with some of the writes we have to do already seems promising. > And do you think maybe you could split this up into a thread for each > topic? I'm having trouble digesting it as such a wall of text. :) All right. I'll try to make a new thread subject new concerns as people bring them up. (Like this one.) -- Senior Software Engineer Red Hat Storage, Ann Arbor, MI, US IRC: Aemerson@{RedHat, OFTC, Freenode} 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C 7C12 80F7 544B 90ED BFB9 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html