Learned quite a bit of ceph details! Thanks! Henry On Thu, Sep 23, 2010 at 3:16 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Wed, 22 Sep 2010, Henry C Chang wrote: >> 2. http://github.com/tcloud/ceph/commit/e3147e8929220997017de8fffa34b9d9c2abf9cf >> >> We hit this assert fail once. Can you check if this patch is reasonable? > > I pushed a different fix for this, > > Â Â Â Âhttp://ceph.newdream.net/git/?p=ceph.git;a=commitdiff;h=a783f409e5e5524b4f2c15f78c716ca77e8aeb3c > > I think the problem was that the state reset (canceling of timer events, > etc.) wasn't happening when the election was started due to another node > (i.e., didn't come through Monitor::call_election()). > > The SafeTimer class (which handles the timeouts) is set up to handle mutex > acquisition for you and let you cancel events without worrying about > races, and the mutex is held over this whole function, so moving the > 'state = ' bit around doesn't actually change behavior wrt the timeouts. > > Thanks! > sage > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html