cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we are currently test cephfs with kernel module (4.17 and 4.18) instead fuse (worked fine),

and we have hang, iowait jump like crazy for around 20min.

client is a qemu 2.12 vm with virtio-net interface.


Is the client logs, we are seeing this kind of logs:

[jeu. nov.  8 12:20:18 2018] libceph: osd14 x.x.x.x:6801 socket closed (con state OPEN)
[jeu. nov.  8 12:42:03 2018] libceph: osd9 x.x.x.x:6821 socket closed (con state OPEN)


and in osd logs:

osd14:
2018-11-08 12:20:25.247 7f31ffac8700  0 -- x.x.x.x:6801/1745 >> x.x.x.x:0/3678871522 conn(0x558c430ec300 :6801 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)

osd9:
2018-11-08 12:42:09.820 7f7ca970e700  0 -- x.x.x.x:6821/1739 >> x.x.x.x:0/3678871522 conn(0x564fcbec5100 :6821 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)


cluster is ceph 13.2.1

Note that we have a physical firewall between client and server, I'm not sure yet if the session could be dropped. (I don't have find any logs in the firewall).

Any idea ? I would like to known if it's a network bug, or ceph bug (not sure how to understand the osd logs)

Regards,

Alexandre



client ceph.conf
----------------
[client]
fuse_disable_pagecache = true
client_reconnect_stale = true


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux