Re: just-rebuilt mon does not join the cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Might be a problem I had as well. Try setting

mon_sync_max_payload_size               4096

If you search this list for that you will find the background.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Jan Kasprzak <kas@xxxxxxxxxx>
Sent: 08 September 2022 16:54:52
To: ceph-users@xxxxxxx
Subject:  Re: just-rebuilt mon does not join the cluster

Jan Kasprzak wrote:
:       Hello,
:
: I had to rebuld a data directory of one of my mons, but now I can't get
: the new mon to join the cluster. What I did was based on this documentation:
: https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/
:
: ceph mon remove mon1
:
: ssh root@mon1
: mkdir /var/lib/ceph/mon/tmp
: mkdir /var/lib/ceph/mon/ceph-mon1
: ceph auth get mon. -o /var/lib/ceph/mon/tmp/keyring
: ceph mon getmap -o /var/lib/ceph/mon/tmp/monmap
: ceph-mon -i mon1 --mkfs --monmap /var/lib/ceph/mon/tmp/monmap --keyring /var/lib/ceph/mon/tmp/keyring
: chown -R ceph:ceph /var/lib/ceph/mon/ceph-mon1
: systemctl start ceph-mon@mon1.service

I tried to reboot the mon1 node, and once again did the above.
Now I don't see the "failed to assign global_id" log messages, but instead
this one:

2022-09-08T16:38:10.826+0200 7f72bb1a9700  0 mon.mon1 does not exist in monmap, will attempt to join an existing cluster
2022-09-08T16:38:10.827+0200 7f72bb1a9700  0 using public_addr v2:X.Y.1.1:0/0 -> [v2:X.Y.1.1:3300/0,v1:X.Y.1.1:6789/0]
2022-09-08T16:38:10.828+0200 7f72bb1a9700  0 starting mon.mon1 rank -1 at public addrs [v2:X.Y.1.1:3300/0,v1:X.Y.1.1:6789/0] at bind addrs [v2:X.Y.1.1:3300/0,v1:X.Y.1.1:6789/0] mon_data /var/lib/ceph/mon/ceph-mon1 fsid 123...my-fsid...def
2022-09-08T16:38:10.834+0200 7f72bb1a9700  1 mon.mon1@-1(???) e0 preinit fsid 123...my-fsid...def

So now it apparently tries to join a new cluster.
I also see traffic coming from other two mons to port 3300 of this host,
so they try to communicate.

But in "ceph -s" there remain only two mons.
The fsid and the public IP address appear to be correct.

How can I debug this further? Thanks,

-Yenya

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
    We all agree on the necessity of compromise. We just can't agree on
    when it's necessary to compromise.                     --Larry Wall
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux