Right now the best way to deal with this is unfortunately to get logs and figure out what operation got blocked. Can you add debug mds = 20 debug ms = 1 debug journaler = 20 to your mds config, restart, and then search through/post that log somewhere we can check it out? (It will be large.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Oct 8, 2013 at 10:21 PM, Dong Yuan <yuandong1222@xxxxxxxxx> wrote: > I think I found a bug about the clientreply of mds. (different from #4742) > > After failover, the standby mds begin to recover and enter the > clientreply state and never moves to the next state(Active). > > I gdbed the mds process by gcore and found that the main thread > (dispatch thread) is idle and mdcache->active_request is empty, but > mds->replay_queue still has one element, that is strange. > > From the code, replay_queue has all requests which need to be > replayed. When the mds enters clientreplay state, > MDS::queue_one_replay will be called to pick a requect from the > replay_queue and put the request into finished_queue. So the replay > operation begins to work. > > After the first replay request has finished, MDS::queue_one_replay > should be called again to deal with the next replay request. There are > three paths to do this: > 1) Server::journal_and_reply > 2) MDCache::reqeust_cleanup > 3) Server::handle_client_request > > But it seems that no path called the MDS::queue_one_replay method. As > a result, the mds stuck in clientreplay state. > > Maybe there is a request process path which will never use the above > three methed. But I can't find the previous request while it seems to > completed and cleaned up from the MDCache. > > There is any one has some idea about these problem? > > I can give more details if needed. I have the core dump but it is too > big (300MB+) to upload. > > Thanks for any help. > > -- > Dong Yuan > Email:yuandong1222@xxxxxxxxx > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html