On 03/20/2013 02:15 PM, Sage Weil wrote: > On Wed, 20 Mar 2013, Yan, Zheng wrote: >> On 03/20/2013 07:09 AM, Greg Farnum wrote: >>> Hmm, this is definitely narrowing the race (probably enough to never hit it), but it's not actually eliminating it (if the restart happens after 4 billion requests?). More importantly this kind of symptom makes me worry that we might be papering over more serious issues with colliding states in the Table on restart. >>> I don't have the MDSTable semantics in my head so I'll need to look into this later unless somebody else volunteers to do so? >> >> Not just 4 billion requests, MDS restart has several stage, mdsmap epoch >> increases for each stage. I don't think there are any more colliding >> states in the table. The table client/server use two phase commit. it's >> similar to client request that involves multiple MDS. the reqid is >> analogy to client request id. The difference is client request ID is >> unique because new client always get an unique session id. > > Each time a tid is consumed (at least for an update) it is journaled in > the EMetaBlob::table_tids list, right? So we could actually take a max > from journal replay and pick up where we left off? That seems like the > cleanest. This approach works only if client journal the reqid before it sending the request to the server. but current implementation is client journal the reqid when it receives server's agree message. Regards Yan, Zheng > > I'm not too worried about 2^32 tids, I guess, but it would be nicer to > avoid that possibility. > > sage > >> >> Thanks >> Yan, Zheng >> >>> -Greg >>> >>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>> >>> >>> On Sunday, March 17, 2013 at 7:51 AM, Yan, Zheng wrote: >>> >>>> From: "Yan, Zheng" <zheng.z.yan@xxxxxxxxx> >>>> >>>> When a MDS becomes active, the table server re-sends 'agree' messages >>>> for old prepared request. If the recoverd MDS starts a new table request >>>> at the same time, The new request's ID can happen to be the same as old >>>> prepared request's ID, because current table client assigns request ID >>>> from zero after MDS restarts. >>>> >>>> Signed-off-by: Yan, Zheng <zheng.z.yan@xxxxxxxxx (mailto:zheng.z.yan@xxxxxxxxx)> >>>> --- >>>> src/mds/MDS.cc (http://MDS.cc) | 3 +++ >>>> src/mds/MDSTableClient.cc (http://MDSTableClient.cc) | 5 +++++ >>>> src/mds/MDSTableClient.h | 2 ++ >>>> 3 files changed, 10 insertions(+) >>>> >>>> diff --git a/src/mds/MDS.cc (http://MDS.cc) b/src/mds/MDS.cc (http://MDS.cc) >>>> index bb1c833..859782a 100644 >>>> --- a/src/mds/MDS.cc (http://MDS.cc) >>>> +++ b/src/mds/MDS.cc (http://MDS.cc) >>>> @@ -1212,6 +1212,9 @@ void MDS::boot_start(int step, int r) >>>> dout(2) << "boot_start " << step << ": opening snap table" << dendl; >>>> snapserver->load(gather.new_sub()); >>>> } >>>> + >>>> + anchorclient->init(); >>>> + snapclient->init(); >>>> >>>> dout(2) << "boot_start " << step << ": opening mds log" << dendl; >>>> mdlog->open(gather.new_sub()); >>>> diff --git a/src/mds/MDSTableClient.cc (http://MDSTableClient.cc) b/src/mds/MDSTableClient.cc (http://MDSTableClient.cc) >>>> index ea021f5..beba0a3 100644 >>>> --- a/src/mds/MDSTableClient.cc (http://MDSTableClient.cc) >>>> +++ b/src/mds/MDSTableClient.cc (http://MDSTableClient.cc) >>>> @@ -34,6 +34,11 @@ >>>> #undef dout_prefix >>>> #define dout_prefix *_dout << "mds." << mds->get_nodeid() << ".tableclient(" << get_mdstable_name(table) << ") " >>>> >>>> +void MDSTableClient::init() >>>> +{ >>>> + // make reqid unique between MDS restarts >>>> + last_reqid = (uint64_t)mds->mdsmap->get_epoch() << 32; >>>> +} >>>> >>>> void MDSTableClient::handle_request(class MMDSTableRequest *m) >>>> { >>>> diff --git a/src/mds/MDSTableClient.h b/src/mds/MDSTableClient.h >>>> index e15837f..78035db 100644 >>>> --- a/src/mds/MDSTableClient.h >>>> +++ b/src/mds/MDSTableClient.h >>>> @@ -63,6 +63,8 @@ public: >>>> MDSTableClient(MDS *m, int tab) : mds(m), table(tab), last_reqid(0) {} >>>> virtual ~MDSTableClient() {} >>>> >>>> + void init(); >>>> + >>>> void handle_request(MMDSTableRequest *m); >>>> >>>> void _prepare(bufferlist& mutation, version_t *ptid, bufferlist *pbl, Context *onfinish); >>>> -- >>>> 1.7.11.7 >>> >>> >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html