This means that the nodes have public and cluster network
separately both on 10.10.10.0/24, or that you did not specify a
separate cluster network?
On 10/23/2017 03:35 PM, Marco Baldini -
H.S. Amiata wrote:
Hello
I have a CEPH cluster with 3 nodes, each with 3 OSDs, running
Proxmox, CEPH versions:
{
"mon": {
"ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)": 3
},
"mgr": {
"ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)": 3
},
"osd": {
"ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)": 9
},
"mds": {},
"overall": {
"ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)": 15
}
}
CEPH has public and cluster network on 10.10.10.0/24, the three
nodes are 10.10.10.251, 10.10.10.252, 10.10.10.253 and
networking is working good (I kept ping from one of the nodes to
the others two running for hours and had 0 packet loss)
On one node with ip 10.10.10.252 I get strange message in dmesg
kern :info : [Oct23 14:42] libceph: mon2 10.10.10.253:6789 session lost, hunting for new mon
kern :info : [ +0.000391] libceph: mon1 10.10.10.252:6789 session established
kern :info : [ +30.721869] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon
kern :info : [ +0.000749] libceph: mon2 10.10.10.253:6789 session established
kern :info : [Oct23 14:43] libceph: mon2 10.10.10.253:6789 session lost, hunting for new mon
kern :info : [ +0.000312] libceph: mon1 10.10.10.252:6789 session established
kern :info : [ +30.721964] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon
kern :info : [ +0.000730] libceph: mon0 10.10.10.251:6789 session established
kern :info : [Oct23 14:44] libceph: mon0 10.10.10.251:6789 session lost, hunting for new mon
kern :info : [ +0.000330] libceph: mon1 10.10.10.252:6789 session established
kern :info : [ +30.721899] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon
kern :info : [ +0.000951] libceph: mon0 10.10.10.251:6789 session established
kern :info : [Oct23 14:45] libceph: mon0 10.10.10.251:6789 session lost, hunting for new mon
kern :info : [ +0.000733] libceph: mon2 10.10.10.253:6789 session established
kern :info : [ +30.721529] libceph: mon2 10.10.10.253:6789 session lost, hunting for new mon
kern :info : [ +0.000328] libceph: mon1 10.10.10.252:6789 session established
kern :info : [Oct23 14:46] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon
kern :info : [ +0.001035] libceph: mon0 10.10.10.251:6789 session established
kern :info : [ +30.721183] libceph: mon0 10.10.10.251:6789 session lost, hunting for new mon
kern :info : [ +0.004221] libceph: mon1 10.10.10.252:6789 session established
kern :info : [Oct23 14:47] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon
kern :info : [ +0.000927] libceph: mon0 10.10.10.251:6789 session established
kern :info : [ +30.721361] libceph: mon0 10.10.10.251:6789 session lost, hunting for new mon
kern :info : [ +0.000524] libceph: mon1 10.10.10.252:6789 session established
and that is going on all the day.
In ceph -w I get
2017-10-23 14:51:57.941131 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0
2017-10-23 14:56:57.941433 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0
2017-10-23 14:56:58.124457 mon.pve-hs-main [INF] mon.1 10.10.10.252:6789/0
2017-10-23 15:00:00.000184 mon.pve-hs-main [INF] overall HEALTH_OK
2017-10-23 15:01:57.941312 mon.pve-hs-main [INF] mon.1 10.10.10.252:6789/0
2017-10-23 15:01:57.941558 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0
2017-10-23 15:06:57.941420 mon.pve-hs-main [INF] mon.1 10.10.10.252:6789/0
2017-10-23 15:06:57.941544 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0
2017-10-23 15:11:57.941573 mon.pve-hs-main [INF] mon.1 10.10.10.252:6789/0
2017-10-23 15:11:57.941659 mon.pve-hs-main [INF] mon.2 10.10.10.253:6789/0
pve-hs-main is the host with ip 10.10.10.251
Actually CEPH storage is very low on usage, on average 200 kB/s
read or write (as shown with ceph -s) so I don't think it's a
problem about load average of the cluster.
The strange is that I see mon1 10.10.10.252:6789 session lost
and that's from log of node 10.10.10.252 so it's losing
connection with the monitor on the same node, I don't think it's
network related.
I already tried with nodes reboot, ceph-mon and ceph-mgr
restart, but the problem is still there.
Any ideas?
Thanks
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com