Re: cephfs client hang issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 20, 2018 at 7:03 AM, 陶冬冬 <tdd21151186@xxxxxxxxx> wrote:
> Hi,
> I’m following the client hang issue recently, Rishabh Dave has a PR to fix this issue (https://github.com/ceph/ceph/pull/21065)
> the main idea proposed by this PR is that client tick will check whether there is a slow request which didn’t got reply for 120s
> if there is one, then make the client request the latest osdmap from the monitor, after the client get the latest osdmap, the hanging requests will be cleaned up
>
> I know this can fix the hang issue, but this solution seems workaround to me, because unmount will still have to wait for 120s, i don’t think it’s user friendly
> I would like to propose another solution here - we request the latest osdmap in client's ms_handle_remote_reset
> here is the eviction flow
> [1] mds send evict command to monitor
> [2] mds then wait for the latest osdmap
> [3] mds get the latest osdmap and then kill the session from the evicted client which is on the blacklist now
> [4] when the evicted client try to communicate with the mds, this client will certainly get a RESET SESSION reply

What you've described makes sense to me Dongdong. Rishabh, can you try
testing a solution that gets the latest osdmap in
::ms_handle_remote_reset for case: MetaSession::STATE_OPENING.
(Rishabh, we can drop [1] if this works as expected.)

> so, as long as the session is killed after the osdmap get updated in monitor, ms_handle_remote_reset will always get the client an latest osdmap which at least contain the new blacklist

As long as the client continues trying reconnecting (it does), the
client will eventually get an osdmap that notes it's blacklisted. So
the ordering is not necessarily important here.

[1] https://github.com/ceph/ceph/pull/21065/commits/d3fb8c24c7df4fa7443bce71c1adf8cf84c6361e

-- 
Patrick Donnelly
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux