Re: mds stuck in clientreplay state after failover

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Right now the best way to deal with this is unfortunately to get logs
and figure out what operation got blocked. Can you add

debug mds = 20
debug ms = 1
debug journaler = 20

to your mds config, restart, and then search through/post that log
somewhere we can check it out? (It will be large.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Tue, Oct 8, 2013 at 10:21 PM, Dong Yuan <yuandong1222@xxxxxxxxx> wrote:
> I think I found a bug about the clientreply of mds. (different from #4742)
>
> After failover, the standby mds begin to recover and enter the
> clientreply state and never moves to the next state(Active).
>
> I gdbed the mds process by gcore and found that the main thread
> (dispatch thread) is idle and  mdcache->active_request is empty, but
> mds->replay_queue still has one element, that is strange.
>
> From the code, replay_queue has all requests which need to be
> replayed. When the mds enters clientreplay state,
> MDS::queue_one_replay will be called to pick a requect from the
> replay_queue and put the request into finished_queue. So the replay
> operation begins to work.
>
> After the first replay request has finished,  MDS::queue_one_replay
> should be called again to deal with the next replay request. There are
> three paths to do this:
> 1) Server::journal_and_reply
> 2) MDCache::reqeust_cleanup
> 3) Server::handle_client_request
>
> But it seems that no path called the MDS::queue_one_replay method. As
> a result,  the mds stuck in clientreplay state.
>
> Maybe there is a request process path which will never use the above
> three methed. But I can't find the previous request while it seems to
> completed and cleaned up from the MDCache.
>
> There is any one has some idea about these problem?
>
> I can give more details if needed. I have the core dump but it is too
> big (300MB+) to upload.
>
> Thanks for any help.
>
> --
> Dong Yuan
> Email:yuandong1222@xxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux