Re: monitor not joining quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 21, 2021 at 3:56 AM Denis Polom <denispolom@xxxxxxxxx> wrote:
>
> Hi
>
> I did, problematic mon was syncing. But issue is while it's syncing,
> Ceph become unreachable. Looks like it tries to become leader but with
> not synced db which may cause Ceph inaccessible?

Can you describe more carefully what the scenario you're running under
is? What's "ceph mon dump" output when the cluster is working?

The first log snippet you provided said the election was happening
because "accept timeout, calling fresh election" — this means the
leader (which was apparently mon.ceph3) tried to commit an update and
didn't get acks in time.

The later one from the synchronizing monitor just says it's
synchronizing, and I don't think it's possible for that monitor to ask
to be the leader until synchronizing completes.

My guess about what's happening here is that your mon.ceph2 is
overloaded trying to get ceph1 up to date, and couldn't do a commit to
disk fast enough. That shouldn't happen, but maybe your VMs don't have
enough resources for the default tunings?
-Greg

>
>
> On 10/20/21 15:30, Michael Moyles wrote:
> > Have you checked sync status and progress?
> >
> > A mon status command on the leader and problematic monitor should show
> > if any sync is going on. When datastores (/var/lib/ceph/mon/ by
> > default) get large the sync can take a long time, assuming the default
> > sync settings, and needs to complete before a mon will join quorum
> >
> >   ceph daemon mon.ceph1/3 mon_status
> >
> > Mike
> >
> >
> > On Wed, 20 Oct 2021 at 07:58, Konstantin Shalygin <k0ste@xxxxxxxx> wrote:
> >
> >     Do you have any backfilling operations?
> >     In our case when backfilling was done mon joins to quorum immediately
> >
> >
> >     k
> >
> >     Sent from my iPhone
> >
> >     > On 20 Oct 2021, at 08:52, Denis Polom <denispolom@xxxxxxxxx> wrote:
> >     >
> >     > 
> >     > Hi,
> >     >
> >     > I've checked it, there is not IP address collision, arp tables
> >     are OK, mtu also and according tcpdump there are not packet being
> >     lost.
> >     >
> >     >
> >     >
> >     > On 10/19/21 21:36, Konstantin Shalygin wrote:
> >     >> Hi,
> >     >>
> >     >>> On 19 Oct 2021, at 21:59, Denis Polom <denispolom@xxxxxxxxx>
> >     wrote:
> >     >>>
> >     >>> 2021-10-19 16:22:07.629 7faec9dd2700  1
> >     mon.ceph1@0(synchronizing) e4 handle_auth_request failed to assign
> >     global_id
> >     >>> 2021-10-19 16:22:08.193 7faec8dd0700  1
> >     mon.ceph1@0(synchronizing) e4 handle_auth_request failed to assign
> >     global_id
> >     >>> 2021-10-19 16:22:09.565 7faec8dd0700  1
> >     mon.ceph1@0(synchronizing) e4 handle_auth_request failed to assign
> >     global_id
> >     >>> 2021-10-19 16:22:11.885 7faec8dd0700  1
> >     mon.ceph1@0(synchronizing) e4 handle_auth_request failed to assign
> >     global_id
> >     >>> 2021-10-19 16:22:14.233 7faec8dd0700  1
> >     mon.ceph1@0(synchronizing) e4 handle_auth_request failed to assign
> >     global_id
> >     >>> 2021-10-19 16:22:14.889 7faec8dd0700  1
> >     mon.ceph1@0(synchronizing) e4 handle_auth_request failed to assign
> >     global_id
> >     >>> 2021-10-19 16:22:16.365 7faec8dd0700  1
> >     mon.ceph1@0(synchronizing) e4 handle_auth_request failed to assign
> >     global_id
> >     >>>
> >     >>> any idea how to get this monitor to join the quorum?
> >     >>
> >     >> We catch this issue couple of weeks ago - this is should be a
> >     network issue. First check ipaddr collisions, arps, losses, mtu
> >     >>
> >     >>
> >     >>
> >     >>
> >     >> k
> >     _______________________________________________
> >     ceph-users mailing list -- ceph-users@xxxxxxx
> >     To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> >
> >
> > This e-mail together with any attachments (the "Message") is
> > confidential and may contain privileged information. If you are not
> > the intended recipient or if you have received this e-mail in error,
> > please notify the sender immediately and permanently delete this
> > Message from your system. Do not copy, disclose or distribute the
> > information contained in this Message.
> >
> > /Maven Investment Partners Ltd (No. 07511928), Maven Investment
> > Partners US Ltd (No. 11494299), Maven Europe Ltd (No. 08966), Maven
> > Derivatives Asia Limited (No.10361312) & Maven Securities Holding Ltd
> > (No. 07505438) are registered as companies in England and Wales and
> > their registered address is Level 3, 6 Bevis Marks, London EC3A 7BA,
> > United Kingdom. The companies’ VAT No. is 135539016. Maven Asia (Hong
> > Kong) Ltd (No. 2444041) is registered in Hong Kong and its registered
> > address is 20/F, Tai Tung Building, 8 Fleming Road, Wan Chai, Hong
> > Kong. Maven Derivatives Amsterdam B.V. (71291377) is registered in the
> > Netherlands and its registered address is 12.02, Spaces, Barbara
> > Strozzilaan 201, Amsterdam, 1083 HN, Netherlands. Maven Europe Ltd is
> > authorised and regulated by the Financial Conduct Authority
> > (FRN:770542). Maven Asia (Hong Kong) Ltd is registered and regulated
> > by the Securities and Futures Commission (CE No: BJF060). Maven
> > Derivatives Amsterdam B.V. is licensed and regulated by the Dutch
> > Authority for the Financial Markets./
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux