It's my personal "production" cluster , by the way.
Hello Gr. Stefan,
1.
osd marked noout,nobackfill,norecover before shutting down .
$ ceph osd set noout
$ ceph osd set nobackfill
$ ceph osd set norecover
2.
[root@ceph-node1 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
[root@ceph-node1 ~]# netstat -antp|grep 6789
tcp 0 0 192.168.1.6:6789 0.0.0.0:* LISTEN 474841/ceph-mon
[root@ceph-node1 ~]# netstat -antp|grep 3300
[root@ceph-node1 ~]#
3. osd mds mgr log is empty!
[root@ceph-node1 ceph]# ls -lh *.log
-rw------- 1 ceph ceph 0 Dec 11 03:09 ceph.audit.log
-rw------- 1 ceph ceph 3.7K Dec 11 08:36 ceph.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-mds.ceph-node1.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-mgr.ceph-node1.log
-rw-r--r-- 1 ceph ceph 2.2M Dec 11 14:42 ceph-mon.ceph-node1.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.0.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.10.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.11.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.1.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.2.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.3.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.4.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.5.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.6.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.7.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.8.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-osd.9.log
-rw-r--r--. 1 ceph ceph 0 Dec 9 03:19 ceph-rgw-ceph-node1.rgw0.log
-rw-r--r-- 1 root root 0 Dec 11 03:09 ceph-volume.log
-rw-r--r-- 1 root root 0 Dec 11 03:09 ceph-volume-systemd.log
4.[root@ceph-node1 ceph]# ceph -s
just blocked ...
error 111 after a few hours
------------------ 原始邮件 ------------------
发件人: "Stefan Kooman"<stefan@xxxxxx>;
发送时间: 2019年12月11日(星期三) 下午2:37
收件人: "Cc君"<occj@xxxxxx>;
抄送: "ceph-users"<ceph-users@xxxxxxxxxxxxxx>;
主题: Re: [ceph-users] ceph-mon is blocked after shutting down and ip address changed
> ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
>
> os :CentOS Linux release 7.7.1908 (Core)
> single node ceph cluster with 1 mon,1mgr,1 mds,1rgw and 12osds , but only cephfs is used.
> ceph -s is blocked after shutting down the machine (192.168.0.104), then ip address changed to 192.168.1.6
>
> I created the monmap with monmap tool and update the ceph.conf , hosts file and then start ceph-mon.
> and the ceph-mon log:
> ...
> 2019-12-11 08:57:45.170 7f952cdac700 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1285.14s
> 2019-12-11 08:57:50.170 7f952cdac700 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1290.14s
> 2019-12-11 08:57:55.171 7f952cdac700 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1295.14s
> 2019-12-11 08:58:00.171 7f952cdac700 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1300.14s
> 2019-12-11 08:58:05.172 7f952cdac700 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1305.14s
> 2019-12-11 08:58:10.171 7f952cdac700 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1310.14s
> 2019-12-11 08:58:15.173 7f952cdac700 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1315.14s
> 2019-12-11 08:58:20.173 7f952cdac700 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1320.14s
> 2019-12-11 08:58:25.174 7f952cdac700 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1325.14s
>
> ...
>
>
> I changed IP back to 192.168.0.104 yeasterday, but all the same.
Just checking here: do you run a firewall? Is port 3300 open (besides
6789)?
What do you see in the logs on the MDS and the ODSs? There are timers
configured in the MON / OSD in case they cannot reach (in time) each
other. OSDs might get marked out. But I'm unsure what is the status of
your cluster. Could you paste a "ceph -s"?
Gr. Stefan
P.s. BTW: is this running production?
--
| BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com