On Wed, 22 Sep 2010, Henry C Chang wrote: > 2. http://github.com/tcloud/ceph/commit/e3147e8929220997017de8fffa34b9d9c2abf9cf > > We hit this assert fail once. Can you check if this patch is reasonable? I pushed a different fix for this, http://ceph.newdream.net/git/?p=ceph.git;a=commitdiff;h=a783f409e5e5524b4f2c15f78c716ca77e8aeb3c I think the problem was that the state reset (canceling of timer events, etc.) wasn't happening when the election was started due to another node (i.e., didn't come through Monitor::call_election()). The SafeTimer class (which handles the timeouts) is set up to handle mutex acquisition for you and let you cancel events without worrying about races, and the mutex is held over this whole function, so moving the 'state = ' bit around doesn't actually change behavior wrt the timeouts. Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html