On Tuesday, March 19, 2013 at 11:49 PM, Yan, Zheng wrote: > On 03/20/2013 02:15 PM, Sage Weil wrote: > > On Wed, 20 Mar 2013, Yan, Zheng wrote: > > > On 03/20/2013 07:09 AM, Greg Farnum wrote: > > > > Hmm, this is definitely narrowing the race (probably enough to never hit it), but it's not actually eliminating it (if the restart happens after 4 billion requests?). More importantly this kind of symptom makes me worry that we might be papering over more serious issues with colliding states in the Table on restart. > > > > I don't have the MDSTable semantics in my head so I'll need to look into this later unless somebody else volunteers to do so? > > > > > > > > > > > > Not just 4 billion requests, MDS restart has several stage, mdsmap epoch > > > increases for each stage. I don't think there are any more colliding > > > states in the table. The table client/server use two phase commit. it's > > > similar to client request that involves multiple MDS. the reqid is > > > analogy to client request id. The difference is client request ID is > > > unique because new client always get an unique session id. > > > > > > > > Each time a tid is consumed (at least for an update) it is journaled in > > the EMetaBlob::table_tids list, right? So we could actually take a max > > from journal replay and pick up where we left off? That seems like the > > cleanest. > > > > I'm not too worried about 2^32 tids, I guess, but it would be nicer to > > avoid that possibility. > > > > Can we re-use the client request ID as table client request ID ? > > Regards > Yan, Zheng Not sure what you're referring to here — do you mean the ID of the filesystem client request which prompted the update? I don't think that would work as client requests actually require two parts to be unique (the client GUID and the request seq number), and I'm pretty sure a single client request can spawn multiple Table updates. As I look over this more, it sure looks to me as if the effect of the code we have (when non-broken) is to rollback every non-committed request by an MDS which restarted — the only time it can handle the TableServer's "agree" with a different response is if the MDS was incorrectly marked out by the map. Am I parsing this correctly, Sage? Given that, and without having looked at the code more broadly, I think we want to add some sort of implicit or explicit handshake letting each of them know if the MDS actually disappeared. We use the process/address nonce to accomplish this in other places… -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html