Hi Ceph user,
--
Agung Laksono
I am learning bugs in ceph and am trying to reproduce http://tracker.ceph.com/issues/15113 on my local machine.
On the description, I may see the scenario:
- ______________________ SafeTimer::timer_thread(), with mon_lock:
- ______________________ elector: in win_election(), it resend_routed_requests(), and collects the routed requests
- msgr: in ms_handle_reset(), it reset the session
- msgr: it waits for the lock
- ______________________ elector: in win_election(), it handle_command(), but the session is reset, hence it panics.
- msgr: remove session, and erase related requests from Monitor::routed_requests.
I could understand the whole flow of the bug except ms_handle_reset(). What's a thing that triggers this ms_handle_reset()?
I could not see this function is called in any part of the monitor class. I've tried to reproduce this on a cluster with 3 MONs and 4 OSDs
and put the printf whenever this function's called. However, It seems arbitrary. Perhaps anyone can help me to explain this?
thanks
Cheers,
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com