Il 30/10/2017 10:31, Alwin Antreich ha scritto:
Hello Marco,
On Mon, Oct 23, 2017 at 05:48:10PM +0200, Marco Baldini - H.S. Amiata wrote:
Hello
ceph-mon services do not restart in any node, yesterday I manually restarted
ceph-mon and ceph-mgr on every node and since them they did not restart
*pve-hs-2$ systemctl status ceph-mon@pve-hs-2.service*
ceph-mon@pve-hs-2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active:*active (running) since Sun 2017-10-22 12:04:22 CEST; 1 day 5h ago*
Main PID: 24825 (ceph-mon)
Tasks: 23
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve-hs-2.service
└─24825 /usr/bin/ceph-mon -f --cluster ceph --id pve-hs-2 --setuser ceph --setgroup ceph
Oct 22 12:04:22 pve-hs-2 systemd[1]: Stopped Ceph cluster monitor daemon.
Oct 22 12:04:22 pve-hs-2 systemd[1]: Started Ceph cluster monitor daemon.
*pve-hs-main$ systemctl status ceph-mon@pve-hs-main.service*
ceph-mon@pve-hs-main.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active:*active (running) since Sun 2017-10-22 12:08:59 CEST; 1 day 5h ago*
Main PID: 24857 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve-hs-main.service
└─24857 /usr/bin/ceph-mon -f --cluster ceph --id pve-hs-main --setuser ceph --setgroup ceph
Oct 22 12:08:59 pve-hs-main systemd[1]: Started Ceph cluster monitor daemon.
*pve-hs-3$ systemctl status ceph-mon@pve-hs-3.service*
ceph-mon@pve-hs-3.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active:*active (running) since Sun 2017-10-22 12:07:43 CEST; 1 day 5h ago*
Main PID: 13077 (ceph-mon)
Tasks: 23
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve-hs-3.service
└─13077 /usr/bin/ceph-mon -f --cluster ceph --id pve-hs-3 --setuser ceph --setgroup ceph
At 17:28 I have this in syslog / journal of pve-hs-2
Oct 23 17:38:47 pve-hs-2 kernel: [255282.309979] libceph: mon1 10.10.10.252:6789 session lost, hunting for new mon
On same node, my ceph-mon.pve-hs-2.log at 17:38 is
https://pastebin.com/8BCUm5Mr
Thanks
Il 23/10/2017 16:26, Alwin Antreich ha scritto:
Does the ceph-mon services restart when the session is lost?
What do you see in the ceph-mon.log on the failing mon node?
--
Cheers,
Alwin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
What is in the other ceph/syslog log files? Please also check your
dmesg, maybe there is something with your bond/LACP.
Actually after some server reboots, the problem seems solved by itself,
that's strange because there have been no change in servers or network
configurations
Only yesterday I had this in dmesg -xe
kern :warn : [Oct29 06:39] libceph: mon2 10.10.10.253:6789 socket closed (con state OPEN)
kern :info : [ +0.000029] libceph: mon2 10.10.10.253:6789 session lost, hunting for new mon
kern :info : [ +0.031530] libceph: mon0 10.10.10.251:6789 session established
On the other nodes at that time there are no warnings or errors.
I think the problem is solved, I don't know how, but ceph is running
fine now.
Thanks
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com