Could you please verify that the mon_map of each mon contains all and
correct mons?
Am 30.08.21 um 21:45 schrieb Chris Dunlop:
Hi,
Does anyone have any suggestions?
Thanks,
Chris
On Mon, Aug 30, 2021 at 03:52:29PM +1000, Chris Dunlop wrote:
Hi,
I'm stuck, mid upgrade from octopus to pacific using cephadm, at the
point of upgrading the mons.
I have 3 mons still on octopus and in quorum. When I try to bring up
a new pacific mon it stays permanently in "probing" state.
The pacific mon is running off:
docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
The lead octopus mon is running off:
quay.io/ceph/ceph:v15
The other 2 octopus mons are 15.2.14-1~bpo10+1. These are manually
started due to the cephadm upgrade failing at the point of upgrading
the mons and leaving me with only one cephadm mon running.
I've confirmed all mons (current and new) can contact each other on
ports 3300 and 6789, and max mtu packets (9000) get through in all
directions.
On the box where I'm trying to start the pacific mon, if I start up
an octopus mon it happily joins the mon set.
With debug_mon=20 on the pacific mon I see *constant* repeated
mon_probe reply processing. The first mon_probe reply produces:
e0 got newer/committed monmap epoch 35, mine was 0
Subsequent mon_probe replies produce:
e35 got newer/committed monmap epoch 35, mine was 35
...but this just keeps repeating and it never gets any further - see
below.
Where to from here?
Cheers,
Chris
----------------------------------------------------------------------
debug_mon=20 from pacific mon
----------------------------------------------------------------------
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0
handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24
name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 )
mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0
handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply
c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0
paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0
monmap is e0: 3 mons at
{noname-a=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],noname-b=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],noname-c=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0
got newer/committed monmap epoch 35, mine was 0
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
bootstrap
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
sync_reset_requester
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
unregister_cluster_logger - not registered
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
cancel_probe_timeout 0x5564a433c900
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
monmap e35: 3 mons at
{b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
_reset
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing).auth
v0 _set_mon_num_rank num 0 rank 0
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
timecheck_finish
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35
health_tick_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35
health_interval_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
scrub_event_cancel
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
scrub_reset
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
reset_probe_timeout 0x5564a433c900 after 2 seconds
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
probing other monitors
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35
_ms_dispatch existing session 0x5564a42fe900 for mon.2
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35
entity_name global_id 0 (none) caps allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 is_capable service=mon
command= read addr v2:10.200.63.132:3300/0 on cap allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow so far , doing
grant allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow all
--
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24
name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 )
mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply
c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0
paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
monmap is e35: 3 mons at
{b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
got newer/committed monmap epoch 35, mine was 35
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
bootstrap
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
sync_reset_requester
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
unregister_cluster_logger - not registered
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
cancel_probe_timeout 0x5564a433c900
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
monmap e35: 3 mons at
{b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
_reset
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing).auth
v0 _set_mon_num_rank num 0 rank 0
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
timecheck_finish
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35
health_tick_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35
health_interval_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
scrub_event_cancel
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
scrub_reset
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
reset_probe_timeout 0x5564a433c900 after 2 seconds
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
probing other monitors
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35
_ms_dispatch existing session 0x5564a42fe900 for mon.2
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35
entity_name global_id 0 (none) caps allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 is_capable service=mon
command= read addr v2:10.200.63.132:3300/0 on cap allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow so far , doing
grant allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow all
--
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24
name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 )
mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply
c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0
paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
monmap is e35: 3 mons at
{b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
got newer/committed monmap epoch 35, mine was 35
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
bootstrap
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
sync_reset_requester
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
unregister_cluster_logger - not registered
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
cancel_probe_timeout 0x5564a433c900
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
monmap e35: 3 mons at
{b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
_reset
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing).auth
v0 _set_mon_num_rank num 0 rank 0
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
timecheck_finish
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35
health_tick_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35
health_interval_stop
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
scrub_event_cancel
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
scrub_reset
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
cancel_probe_timeout (none scheduled)
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
reset_probe_timeout 0x5564a433c900 after 2 seconds
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35
probing other monitors
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35
_ms_dispatch existing session 0x5564a42fe900 for mon.2
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35
entity_name global_id 0 (none) caps allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 is_capable service=mon
command= read addr v2:10.200.63.132:3300/0 on cap allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow so far , doing
grant allow *
Aug 29 08:25:34 b5 conmon[2648666]: debug
2021-08-28T22:25:34.792+0000 7f74f223a700 20 allow all
...and repeats constantly
----------------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx