Re: mon startup problem on upgrade octopus to pacific

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Could you please verify that the mon_map of each mon contains all and correct mons?

Am 30.08.21 um 21:45 schrieb Chris Dunlop:
Hi,

Does anyone have any suggestions?

Thanks,

Chris

On Mon, Aug 30, 2021 at 03:52:29PM +1000, Chris Dunlop wrote:
Hi,

I'm stuck, mid upgrade from octopus to pacific using cephadm, at the point of upgrading the mons.

I have 3 mons still on octopus and in quorum. When I try to bring up a new pacific mon it stays permanently in "probing" state.

The pacific mon is running off:

docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb

The lead octopus mon is running off:

quay.io/ceph/ceph:v15

The other 2 octopus mons are 15.2.14-1~bpo10+1. These are manually started due to the cephadm upgrade failing at the point of upgrading the mons and leaving me with only one cephadm mon running.

I've confirmed all mons (current and new) can contact each other on ports 3300 and 6789, and max mtu packets (9000) get through in all directions.

On the box where I'm trying to start the pacific mon, if I start up an octopus mon it happily joins the mon set.

With debug_mon=20 on the pacific mon I see *constant* repeated mon_probe reply processing. The first mon_probe reply produces:

e0  got newer/committed monmap epoch 35, mine was 0

Subsequent mon_probe replies produce:

e35 got newer/committed monmap epoch 35, mine was 35

...but this just keeps repeating and it never gets any further - see below.

Where to from here?

Cheers,

Chris

----------------------------------------------------------------------
debug_mon=20 from pacific mon
----------------------------------------------------------------------
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0  monmap is e0: 3 mons at {noname-a=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],noname-b=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],noname-c=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]} Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e0  got newer/committed monmap epoch 35, mine was 0 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 bootstrap Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 sync_reset_requester Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5564a433c900 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]} Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 _reset Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled) Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 timecheck_finish Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_tick_stop Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_interval_stop Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_event_cancel Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_reset Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled) Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5564a433c900 after 2 seconds Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 probing other monitors Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35 _ms_dispatch existing session 0x5564a42fe900 for mon.2 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35  entity_name  global_id 0 (none) caps allow * Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 is_capable service=mon command= read addr v2:10.200.63.132:3300/0 on cap allow * Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20  allow so far , doing grant allow * Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20  allow all
--
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35  monmap is e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]} Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35  got newer/committed monmap epoch 35, mine was 35 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 bootstrap Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 sync_reset_requester Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5564a433c900 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]} Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 _reset Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled) Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 timecheck_finish Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_tick_stop Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_interval_stop Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_event_cancel Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_reset Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled) Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5564a433c900 after 2 seconds Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 probing other monitors Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35 _ms_dispatch existing session 0x5564a42fe900 for mon.2 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35  entity_name  global_id 0 (none) caps allow * Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 is_capable service=mon command= read addr v2:10.200.63.132:3300/0 on cap allow * Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20  allow so far , doing grant allow * Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20  allow all
--
Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 364908695 lc 364909318 ) mon_release octopus) v7 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35  monmap is e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]} Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35  got newer/committed monmap epoch 35, mine was 35 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 bootstrap Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 sync_reset_requester Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5564a433c900 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]} Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 _reset Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled) Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 timecheck_finish Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_tick_stop Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 15 mon.b5@-1(probing) e35 health_interval_stop Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_event_cancel Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 scrub_reset Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled) Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5564a433c900 after 2 seconds Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 10 mon.b5@-1(probing) e35 probing other monitors Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35 _ms_dispatch existing session 0x5564a42fe900 for mon.2 Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 mon.b5@-1(probing) e35  entity_name  global_id 0 (none) caps allow * Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20 is_capable service=mon command= read addr v2:10.200.63.132:3300/0 on cap allow * Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20  allow so far , doing grant allow * Aug 29 08:25:34 b5 conmon[2648666]: debug 2021-08-28T22:25:34.792+0000 7f74f223a700 20  allow all

...and repeats constantly
----------------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux