Re: Several patches for CEPH

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 22 Sep 2010 12:16:31 -0700 (PDT)

On Wed, 22 Sep 2010, Henry C Chang wrote:
> 2. http://github.com/tcloud/ceph/commit/e3147e8929220997017de8fffa34b9d9c2abf9cf
> 
> We hit this assert fail once. Can you check if this patch is reasonable?

I pushed a different fix for this, 

	http://ceph.newdream.net/git/?p=ceph.git;a=commitdiff;h=a783f409e5e5524b4f2c15f78c716ca77e8aeb3c

I think the problem was that the state reset (canceling of timer events, 
etc.) wasn't happening when the election was started due to another node 
(i.e., didn't come through Monitor::call_election()).

The SafeTimer class (which handles the timeouts) is set up to handle mutex 
acquisition for you and let you cancel events without worrying about 
races, and the mutex is held over this whole function, so moving the 
'state = ' bit around doesn't actually change behavior wrt the timeouts.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html