Re: just-rebuilt mon does not join the cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Looking at my mon rebuild instructions, I have a few extra steps there that your command list didn't include (like 'ceph mon add ...' and a ceph-mon --inject-monmap steps):

cd /var/lib/ceph/tmp
ceph auth get mon. -o mon-auth
ceph mon getmap -o mon-map
ceph-mon -i $MONHOST --mkfs --monmap mon-map --keyring mon-auth
ceph mon add $MONHOST $MONIP
ceph mon getmap -o mon-map
ceph-mon -i $MONHOST --inject-monmap mon-map
chown -R ceph.ceph /var/lib/ceph/mon
rm mon-auth mon-map

You might want to give this procedure a try.

Andras


On 9/8/22 10:54, Jan Kasprzak wrote:
Jan Kasprzak wrote:
: 	Hello,
:
: I had to rebuld a data directory of one of my mons, but now I can't get
: the new mon to join the cluster. What I did was based on this documentation:
: https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/
:
: ceph mon remove mon1
:
: ssh root@mon1
: mkdir /var/lib/ceph/mon/tmp
: mkdir /var/lib/ceph/mon/ceph-mon1
: ceph auth get mon. -o /var/lib/ceph/mon/tmp/keyring
: ceph mon getmap -o /var/lib/ceph/mon/tmp/monmap
: ceph-mon -i mon1 --mkfs --monmap /var/lib/ceph/mon/tmp/monmap --keyring /var/lib/ceph/mon/tmp/keyring
: chown -R ceph:ceph /var/lib/ceph/mon/ceph-mon1
: systemctl start ceph-mon@mon1.service

I tried to reboot the mon1 node, and once again did the above.
Now I don't see the "failed to assign global_id" log messages, but instead
this one:

2022-09-08T16:38:10.826+0200 7f72bb1a9700  0 mon.mon1 does not exist in monmap, will attempt to join an existing cluster
2022-09-08T16:38:10.827+0200 7f72bb1a9700  0 using public_addr v2:X.Y.1.1:0/0 -> [v2:X.Y.1.1:3300/0,v1:X.Y.1.1:6789/0]
2022-09-08T16:38:10.828+0200 7f72bb1a9700  0 starting mon.mon1 rank -1 at public addrs [v2:X.Y.1.1:3300/0,v1:X.Y.1.1:6789/0] at bind addrs [v2:X.Y.1.1:3300/0,v1:X.Y.1.1:6789/0] mon_data /var/lib/ceph/mon/ceph-mon1 fsid 123...my-fsid...def
2022-09-08T16:38:10.834+0200 7f72bb1a9700  1 mon.mon1@-1(???) e0 preinit fsid 123...my-fsid...def

So now it apparently tries to join a new cluster.
I also see traffic coming from other two mons to port 3300 of this host,
so they try to communicate.

But in "ceph -s" there remain only two mons.
The fsid and the public IP address appear to be correct.

How can I debug this further? Thanks,

-Yenya

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux