Re: cephfs client hang issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 在 2018年4月21日,上午1:54,Patrick Donnelly <pdonnell@xxxxxxxxxx> 写道:
> 
> On Fri, Apr 20, 2018 at 7:03 AM, 陶冬冬 <tdd21151186@xxxxxxxxx> wrote:
>> Hi,
>> I’m following the client hang issue recently, Rishabh Dave has a PR to fix this issue (https://github.com/ceph/ceph/pull/21065)
>> the main idea proposed by this PR is that client tick will check whether there is a slow request which didn’t got reply for 120s
>> if there is one, then make the client request the latest osdmap from the monitor, after the client get the latest osdmap, the hanging requests will be cleaned up
>> 
>> I know this can fix the hang issue, but this solution seems workaround to me, because unmount will still have to wait for 120s, i don’t think it’s user friendly
>> I would like to propose another solution here - we request the latest osdmap in client's ms_handle_remote_reset
>> here is the eviction flow
>> [1] mds send evict command to monitor
>> [2] mds then wait for the latest osdmap
>> [3] mds get the latest osdmap and then kill the session from the evicted client which is on the blacklist now
>> [4] when the evicted client try to communicate with the mds, this client will certainly get a RESET SESSION reply
> 
> What you've described makes sense to me Dongdong. Rishabh, can you try
> testing a solution that gets the latest osdmap in
> ::ms_handle_remote_reset for case: MetaSession::STATE_OPENING.
> (Rishabh, we can drop [1] if this works as expected.)
> 
   Patrick,   should also test on MetaSession::STATE_OPEN ? 
   and, i don’t see Client::handle_osd_map is trying to clean up the request which is waiting on list "session->waiting_for_open"
   I think this will also make the client hang in state MetaSession::STATE_OPENING even it notice it’s in blacklist now.
   so, i think we should also call signal_context_list (session->waiting_for_open) in Client::handle_osd_map

>> so, as long as the session is killed after the osdmap get updated in monitor, ms._handle_remote_reset will always get the client an latest osdmap which at least contain the new blacklist
> 
> As long as the client continues trying reconnecting (it does), the
> client will eventually get an osdmap that notes it's blacklisted. So
> the ordering is not necessarily important here.
> 
> [1] https://github.com/ceph/ceph/pull/21065/commits/d3fb8c24c7df4fa7443bce71c1adf8cf84c6361e
> 
> -- 
> Patrick Donnelly

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux