Re: cephfs kernel client hangs

Zhenshi Zhou <deaderzzs@xxxxxxxxx> · Mon, 13 Aug 2018 21:22:14 +0800

Hi,Finally, I got a running server with files /sys/kernel/debug/ceph/xxx/

[root@docker27 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat mdsc
[root@docker27 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat monc
have monmap 2 want 3+
have osdmap 4545 want 4546
have fsmap.user 0
have mdsmap 335 want 336+
fs_cluster_id -1
[root@docker27 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat osdc
REQUESTS 6 homeless 0
82580   osd10   1.7f9ddac7      [10,13]/10      [10,13]/10      10000053a04.00000000    0x400024        1       write
81019   osd11   1.184ed679      [11,7]/11       [11,7]/11       1000005397b.00000000    0x400024        1       write
81012   osd12   1.cd98ed57      [12,9]/12       [12,9]/12       10000053971.00000000    0x400024        1       write,startsync
82589   osd12   1.7cd5405a      [12,8]/12       [12,8]/12       10000053a13.00000000    0x400024        1       write,startsync
80972   osd13   1.91886156      [13,4]/13       [13,4]/13       10000053939.00000000    0x400024        1       write
81035   osd13   1.ac5ccb56      [13,4]/13       [13,4]/13       10000053997.00000000    0x400024        1       write

The cluster claims nothing, and shows HEALTH_OK still.
What I did is just vim a file storing on cephfs, and then it hung there. And I got a process with 'D' stat.
By the way, the whole mount directory is still in use and with no error.

What can I do to fix it?

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> 于2018年8月9日周四 下午9:42写道：
Hi,

On 08/09/2018 03:21 PM, Yan, Zheng wrote:

> try 'mount -f', recent kernel should handle 'mount -f' pretty well

> On Wed, Aug 8, 2018 at 10:46 PM Zhenshi Zhou <deaderzzs@xxxxxxxxx> wrote:

>> Hi,

>> Is there any other way excpet rebooting the server when the client hangs?

>> If the server is in production environment, I can't restart it everytime.

One method that worked for me today:

- failover to other mds (during failover the client disconnect should be 

denied, e.g. 'mds.ceph-storage-01 [INF] denied reconnect attempt (mds is 

up:reconnect) from client.19660826 192.168.2.92:0/2522971681 (session is 

closed)')

- failover to first mds again (this time, the client should not try to 

connect. mds_sessions in the debug directory should not list a session)

- accessing the mountpoint triggered a reconned to the now active mds

Regards,

Burkhard

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com