Re: Monitor unable to join existing cluster, stuck at probing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This does sound like a network problem. Try increasing log levels for
mon (debug_mon = 10/10) and maybe the messenger (debug_ms=5/5 or
10/10, very noisy, to see where it is stuck)


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Oct 16, 2019 at 1:42 PM <msmit@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> I'm currently working on upgrading my existing monitors within my cluster. During the first deployment of this production cluster I made some choices that in hindsight where not the best. But, it worked, I learned and now I wish to mediate my previous bad choices.
>
> The cluster exists of three monitors that are currently in quorum and I wish to upgrade each of them by fully removing them from the cluster and rejoining them after a complete reinstall of the os (new hostname, new ip). Therefore I want to maintain quorum by temporary adding a monitor but this won't go as planned as the monitor will join, with `ceph-deploy add monitor mon4` but never leave the probing state (see log below).
>
> I have verified all networking and firewall settings and don't notice any connection errors, neither do I see any weird hostnames or ip-addresses in the existing monmap on all the hosts. Also manually confirmed that all the keys on the cluster are the same, so don't suspect a authentication error.
>
> Hope someone has any guidance.
>
> Thx.
>
> Log from mon4 > /var/log/ceph/ceph-mon.mon4.log
>
> 2019-10-16 11:21:51.960 7fc709c73a00  0 mon.mon4 does not exist in monmap, will attempt to join an existing cluster
> 2019-10-16 11:21:51.962 7fc709c73a00  0 using public_addr 10.200.1.104:0/0 -> 10.200.1.104:6789/0
> 2019-10-16 11:21:51.963 7fc709c73a00  0 starting mon.mon4 rank -1 at public addr 10.200.1.104:6789/0 at bind addr 10.200.1.104:6789/0 mon_data /var/lib/ceph/mon/ceph-mon4 fsid aaf1547b-8944-4f48-b354-93659202c6fe
> 2019-10-16 11:21:51.964 7fc709c73a00  0 starting mon.mon4 rank -1 at 10.200.1.104:6789/0 mon_data /var/lib/ceph/mon/ceph-mon4 fsid aaf1547b-8944-4f48-b354-93659202c6fe
> 2019-10-16 11:21:51.965 7fc709c73a00  1 mon.mon4@-1(probing) e0 preinit fsid aaf1547b-8944-4f48-b354-93659202c6fe
> 2019-10-16 11:21:51.965 7fc709c73a00  1 mon.mon4@-1(probing) e0  initial_members mon1,mon2,mon3,mon4, filtering seed monmap
> 2019-10-16 11:21:51.965 7fc709c73a00  1 mon.mon4@-1(probing).mds e0 Unable to load 'last_metadata'
> 2019-10-16 11:21:51.967 7fc709c73a00  0 mon.mon4@-1(probing) e0  my rank is now 3 (was -1)
> 2019-10-16 11:21:54.054 7fc6f934b700  0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch
> 2019-10-16 11:21:54.054 7fc6f934b700  0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished
> 2019-10-16 11:21:54.300 7fc6f934b700  0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch
> 2019-10-16 11:21:54.300 7fc6f934b700  0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished
> 2019-10-16 11:22:26.967 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387)
> 2019-10-16 11:22:31.967 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387)
> 2019-10-16 11:22:36.967 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387)
> 2019-10-16 11:22:37.478 7fc6f934b700  0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch
> 2019-10-16 11:22:37.478 7fc6f934b700  0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished
> 2019-10-16 11:22:41.968 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387)
> 2019-10-16 11:22:46.968 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387)
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux