=?gb18030?b?u9i4tKO6ILvYuLSjuiAgY2VwaC1tb24gaXMgYmxv?==?gb18030?q?cked_after_shutting_down_and_ip_address_changed?=

"=?gb18030?b?Q2O+/Q==?=" <occj@xxxxxx> · Wed, 11 Dec 2019 14:54:08 +0800

It's my personal "production" cluster , by the way.

Hello Gr. Stefan,
1.
osd  marked noout,nobackfill,norecover before shutting down .
$ ceph osd set noout
$ ceph osd set nobackfill
$ ceph osd set norecover

2.
[root@ceph-node1 ~]# systemctl  status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)
[root@ceph-node1 ~]# netstat  -antp|grep 6789
tcp        0      0 192.168.1.6:6789        0.0.0.0:*               LISTEN      474841/ceph-mon     
[root@ceph-node1 ~]# netstat  -antp|grep 3300
[root@ceph-node1 ~]# 

3. osd mds mgr log is empty!
[root@ceph-node1 ceph]# ls -lh *.log
-rw-------  1 ceph ceph    0 Dec 11 03:09 ceph.audit.log
-rw-------  1 ceph ceph 3.7K Dec 11 08:36 ceph.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-mds.ceph-node1.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-mgr.ceph-node1.log
-rw-r--r--  1 ceph ceph 2.2M Dec 11 14:42 ceph-mon.ceph-node1.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.0.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.10.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.11.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.1.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.2.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.3.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.4.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.5.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.6.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.7.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.8.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-osd.9.log
-rw-r--r--. 1 ceph ceph    0 Dec  9 03:19 ceph-rgw-ceph-node1.rgw0.log
-rw-r--r--  1 root root    0 Dec 11 03:09 ceph-volume.log
-rw-r--r--  1 root root    0 Dec 11 03:09 ceph-volume-systemd.log

4.[root@ceph-node1 ceph]# ceph -s
just blocked ...
error 111 after a few hours

------------------ 原始邮件 ------------------
发件人: "Stefan Kooman"<stefan@xxxxxx>;
发送时间: 2019年12月11日(星期三) 下午2:37
收件人: "Cc君"<occj@xxxxxx>;
抄送: "ceph-users"<ceph-users@xxxxxxxxxxxxxx>;
主题: Re: [ceph-users] ceph-mon is blocked after shutting down and ip address changed

Quoting Cc君 (occj@xxxxxx):
> ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
> 
> os :CentOS Linux release 7.7.1908 (Core)
> single node ceph cluster with 1 mon,1mgr,1 mds,1rgw and 12osds , but only&nbsp; cephfs is used.
> &nbsp;ceph -s&nbsp; &nbsp;is blocked after&nbsp; shutting down the machine (192.168.0.104), then ip address changed to&nbsp; 192.168.1.6
> 
> &nbsp;I created the monmap with monmap tool and&nbsp; update the ceph.conf , hosts file and then start ceph-mon.
> and the ceph-mon&nbsp; log:
> ...
> 2019-12-11 08:57:45.170 7f952cdac700&nbsp; 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1285.14s
> 2019-12-11 08:57:50.170 7f952cdac700&nbsp; 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1290.14s
> 2019-12-11 08:57:55.171 7f952cdac700&nbsp; 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1295.14s
> 2019-12-11 08:58:00.171 7f952cdac700&nbsp; 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1300.14s
> 2019-12-11 08:58:05.172 7f952cdac700&nbsp; 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1305.14s
> 2019-12-11 08:58:10.171 7f952cdac700&nbsp; 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1310.14s
> 2019-12-11 08:58:15.173 7f952cdac700&nbsp; 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1315.14s
> 2019-12-11 08:58:20.173 7f952cdac700&nbsp; 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1320.14s
> 2019-12-11 08:58:25.174 7f952cdac700&nbsp; 1 mon.ceph-node1@0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1325.14s
> 
> ...
> 
> 
> I changed IP back to 192.168.0.104 yeasterday, but all the same.

Just checking here: do you run a firewall? Is port 3300 open (besides
6789)?

What do you see in the logs on the MDS and the ODSs? There are timers
configured in the MON / OSD in case they cannot reach (in time) each
other. OSDs might get marked out. But I'm unsure what is the status of
your cluster. Could you paste a "ceph -s"?

Gr. Stefan

P.s. BTW: is this running production?

-- 
| BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com