Hello I have a CEPH cluster with 3 nodes, each with 3 OSDs, running Proxmox, CEPH versions: { "mon": { "ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)": 3 }, "mgr": { "ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)": 3 }, "osd": { "ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)": 9 }, "mds": {}, "overall": { "ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)": 15 } } CEPH has public and cluster network on 10.10.10.0/24, the three
nodes are 10.10.10.251, 10.10.10.252, 10.10.10.253 and networking
is working good (I kept ping from one of the nodes to the others
two running for hours and had 0 packet loss) On one node with ip 10.10.10.252 I get strange message in dmesg kern :info : [Oct23 14:42] libceph: mon2 10.10.10.253:6789 session lost, hunting for new mon kern :info : [ +0.000391] libceph: mon1 10.10.10.252:6789 session established kern :info : [ +30.721869] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon kern :info : [ +0.000749] libceph: mon2 10.10.10.253:6789 session established kern :info : [Oct23 14:43] libceph: mon2 10.10.10.253:6789 session lost, hunting for new mon kern :info : [ +0.000312] libceph: mon1 10.10.10.252:6789 session established kern :info : [ +30.721964] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon kern :info : [ +0.000730] libceph: mon0 10.10.10.251:6789 session established kern :info : [Oct23 14:44] libceph: mon0 10.10.10.251:6789 session lost, hunting for new mon kern :info : [ +0.000330] libceph: mon1 10.10.10.252:6789 session established kern :info : [ +30.721899] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon kern :info : [ +0.000951] libceph: mon0 10.10.10.251:6789 session established kern :info : [Oct23 14:45] libceph: mon0 10.10.10.251:6789 session lost, hunting for new mon kern :info : [ +0.000733] libceph: mon2 10.10.10.253:6789 session established kern :info : [ +30.721529] libceph: mon2 10.10.10.253:6789 session lost, hunting for new mon kern :info : [ +0.000328] libceph: mon1 10.10.10.252:6789 session established kern :info : [Oct23 14:46] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon kern :info : [ +0.001035] libceph: mon0 10.10.10.251:6789 session established kern :info : [ +30.721183] libceph: mon0 10.10.10.251:6789 session lost, hunting for new mon kern :info : [ +0.004221] libceph: mon1 10.10.10.252:6789 session established kern :info : [Oct23 14:47] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon kern :info : [ +0.000927] libceph: mon0 10.10.10.251:6789 session established kern :info : [ +30.721361] libceph: mon0 10.10.10.251:6789 session lost, hunting for new mon kern :info : [ +0.000524] libceph: mon1 10.10.10.252:6789 session established and that is going on all the day. In ceph -w I get 2017-10-23 14:51:57.941131 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0 2017-10-23 14:56:57.941433 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0 2017-10-23 14:56:58.124457 mon.pve-hs-main [INF] mon.1 10.10.10.252:6789/0 2017-10-23 15:00:00.000184 mon.pve-hs-main [INF] overall HEALTH_OK 2017-10-23 15:01:57.941312 mon.pve-hs-main [INF] mon.1 10.10.10.252:6789/0 2017-10-23 15:01:57.941558 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0 2017-10-23 15:06:57.941420 mon.pve-hs-main [INF] mon.1 10.10.10.252:6789/0 2017-10-23 15:06:57.941544 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0 2017-10-23 15:11:57.941573 mon.pve-hs-main [INF] mon.1 10.10.10.252:6789/0 2017-10-23 15:11:57.941659 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0 pve-hs-main is the host with ip 10.10.10.251 Actually CEPH storage is very low on usage, on average 200 kB/s read or write (as shown with ceph -s) so I don't think it's a problem about load average of the cluster. The strange is that I see mon1 10.10.10.252:6789 session lost and that's from log of node 10.10.10.252 so it's losing connection with the monitor on the same node, I don't think it's network related. I already tried with nodes reboot, ceph-mon and ceph-mgr restart, but the problem is still there. Any ideas? Thanks
--
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com