Re: [PATCH 04/39] mds: make sure table request id unique

Greg Farnum <greg@xxxxxxxxxxx> · Wed, 20 Mar 2013 11:31:23 -0700

On Tuesday, March 19, 2013 at 11:49 PM, Yan, Zheng wrote:
> On 03/20/2013 02:15 PM, Sage Weil wrote:
> > On Wed, 20 Mar 2013, Yan, Zheng wrote:
> > > On 03/20/2013 07:09 AM, Greg Farnum wrote:
> > > > Hmm, this is definitely narrowing the race (probably enough to never hit it), but it's not actually eliminating it (if the restart happens after 4 billion requests?). More importantly this kind of symptom makes me worry that we might be papering over more serious issues with colliding states in the Table on restart.
> > > > I don't have the MDSTable semantics in my head so I'll need to look into this later unless somebody else volunteers to do so?
> > >  
> > >  
> > >  
> > > Not just 4 billion requests, MDS restart has several stage, mdsmap epoch  
> > > increases for each stage. I don't think there are any more colliding  
> > > states in the table. The table client/server use two phase commit. it's  
> > > similar to client request that involves multiple MDS. the reqid is  
> > > analogy to client request id. The difference is client request ID is  
> > > unique because new client always get an unique session id.
> >  
> >  
> >  
> > Each time a tid is consumed (at least for an update) it is journaled in  
> > the EMetaBlob::table_tids list, right? So we could actually take a max  
> > from journal replay and pick up where we left off? That seems like the  
> > cleanest.
> >  
> > I'm not too worried about 2^32 tids, I guess, but it would be nicer to  
> > avoid that possibility.
>  
>  
>  
> Can we re-use the client request ID as table client request ID ?
>  
> Regards
> Yan, Zheng

Not sure what you're referring to here — do you mean the ID of the filesystem client request which prompted the update? I don't think that would work as client requests actually require two parts to be unique (the client GUID and the request seq number), and I'm pretty sure a single client request can spawn multiple Table updates.

As I look over this more, it sure looks to me as if the effect of the code we have (when non-broken) is to rollback every non-committed request by an MDS which restarted — the only time it can handle the TableServer's "agree" with a different response is if the MDS was incorrectly marked out by the map. Am I parsing this correctly, Sage? Given that, and without having looked at the code more broadly, I think we want to add some sort of implicit or explicit handshake letting each of them know if the MDS actually disappeared. We use the process/address nonce to accomplish this in other places…
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html