cephfs client hang issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I’m following the client hang issue recently, Rishabh Dave has a PR to fix this issue (https://github.com/ceph/ceph/pull/21065)
the main idea proposed by this PR is that client tick will check whether there is a slow request which didn’t got reply for 120s
if there is one, then make the client request the latest osdmap from the monitor, after the client get the latest osdmap, the hanging requests will be cleaned up

I know this can fix the hang issue, but this solution seems workaround to me, because unmount will still have to wait for 120s, i don’t think it’s user friendly
I would like to propose another solution here - we request the latest osdmap in client's ms_handle_remote_reset
here is the eviction flow
[1] mds send evict command to monitor
[2] mds then wait for the latest osdmap
[3] mds get the latest osdmap and then kill the session from the evicted client which is on the blacklist now
[4] when the evicted client try to communicate with the mds, this client will certainly get a RESET SESSION reply

so, as long as the session is killed after the osdmap get updated in monitor, ms_handle_remote_reset will always get the client an latest osdmap which at least contain the new blacklist

Please correct me if i’m missing something here

Regards,
Dongdong--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux