Hi,
Does anyone have any suggestions?
Thanks,
Chris
On Mon, Aug 30, 2021 at 03:52:29PM +1000, Chris Dunlop wrote:
Hi,
I'm stuck, mid upgrade from octopus to pacific using cephadm, at the
point of upgrading the mons.
I have 3 mons still on octopus and in quorum. When I try to bring up a
new pacific mon it stays permanently in "probing" state.
The pacific mon is running off:
docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
The lead octopus mon is running off:
quay.io/ceph/ceph:v15
The other 2 octopus mons are 15.2.14-1~bpo10+1. These are manually
started due to the cephadm upgrade failing at the point of upgrading
the mons and leaving me with only one cephadm mon running.
I've confirmed all mons (current and new) can contact each other on
ports 3300 and 6789, and max mtu packets (9000) get through in all
directions.
On the box where I'm trying to start the pacific mon, if I start up an
octopus mon it happily joins the mon set.
With debug_mon=20 on the pacific mon I see *constant* repeated
mon_probe reply processing. The first mon_probe reply produces:
e0 got newer/committed monmap epoch 35, mine was 0
Subsequent mon_probe replies produce:
e35 got newer/committed monmap epoch 35, mine was 35
...but this just keeps repeating and it never gets any further - see
below.
Where to from here?
Cheers,
Chris
----------------------------------------------------------------------
debug_mon=20 from pacific mon
----------------------------------------------------------------------
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0 monmap is e0: 3 mons at {noname-a=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],noname-b=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],noname-c=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0 got newer/committed monmap epoch 35, mine was 0
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 bootstrap
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 sync_reset_requester
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5564a433c900
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 _reset
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 timecheck_finish
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_tick_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_interval_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_event_cancel
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_reset
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5564a433c900 after 2 seconds
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 probing other monitors
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35 _ms_dispatch existing session 0x5564a42fe900 for mon.2
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35 entity_name global_id 0 (none) caps allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 is_capable service=mon command= read addr v2:10.200.63.132:3300/0 on cap allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow so far , doing grant allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow all
--
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 monmap is e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 got newer/committed monmap epoch 35, mine was 35
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 bootstrap
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 sync_reset_requester
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5564a433c900
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 _reset
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 timecheck_finish
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_tick_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_interval_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_event_cancel
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_reset
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5564a433c900 after 2 seconds
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 probing other monitors
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35 _ms_dispatch existing session 0x5564a42fe900 for mon.2
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35 entity_name global_id 0 (none) caps allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 is_capable service=mon command= read addr v2:10.200.63.132:3300/0 on cap allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow so far , doing grant allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow all
--
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 monmap is e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 got newer/committed monmap epoch 35, mine was 35
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 bootstrap
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 sync_reset_requester
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5564a433c900
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 _reset
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 timecheck_finish
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_tick_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_interval_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_event_cancel
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_reset
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5564a433c900 after 2 seconds
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 probing other monitors
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35 _ms_dispatch existing session 0x5564a42fe900 for mon.2
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35 entity_name global_id 0 (none) caps allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 is_capable service=mon command= read addr v2:10.200.63.132:3300/0 on cap allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow so far , doing grant allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow all
...and repeats constantly
----------------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx