Re: ceph-users Digest, Vol 50, Issue 1

Jon Wright <jonrodwright@xxxxxxxxx> · Wed, 1 Mar 2017 19:30:59 -0500

    Thank you for your response. :)
    Version was Jewel - 10.2.2.  And, yes I did restart the monitors
      with no change in results.

    For the record, here's the problem.   It was a multi-pool
      cluster, and the Crush rules had an inappropriately large number
      for the step chooseleaf line.  I won't get into details because it
      would raise more questions than it would answer.  But enough OSDs
      went down to result in there being no solution for some PG
      groups.   Neverthess the monitors continued searching for a
      solution (based on the chooseleaf parameter applied to each
      placement group).  

    The monitors were spending all their time running crush and then
      calling an election when a timeout expired.

    That's the cause of the problem.   Maybe I'll post a solution if
      and when we get out of this state.

    Sadly, my fault.  It might be nice to get a warning when you try
      to do something really stupid like that.

    On 03/01/2017 03:02 PM, ceph-users-request@xxxxxxxxxxxxxx wrote:

      Date: Tue, 28 Feb 2017 23:52:26 +0000
From: Joao Eduardo Luis <joao@xxxxxxx>
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  monitors at 100%; cluster out of service
Message-ID: <d7904f61-63fc-e59c-e31e-791dd4e05506@xxxxxxx>
Content-Type: text/plain; charset=windows-1252; format=flowed

On 02/28/2017 09:53 PM, WRIGHT, JON R (JON R) wrote:

        > I currently have a situation where the monitors are running at 100% CPU,
> and can't run any commands because authentication times out after 300
> seconds.
>
> I stopped the leader, and the resulting election picked a new leader,
> but that monitor shows exactly the same behavor.
>
> Now both monitors *think* they are the leader and call new elections
> against the third monitor, both winning each time.   Essentially they
> alternate between calling an election (which they win) and then pegging
> one of the CPUs at 100%.
>
> strace suggests that the monitor daemons are spending the "pegged" time
> in user space, and attaching a debugger to the running process suggests
> that the monitor is spending its time doing crushmap calculations in
> fn_monstore.
>
> Setting paxos_debug to 10 produces this log message:
>
> 2017-02-28 16:50:49.503712 7f218ccd4700  7
> mon.hlxkvm001-storage@0(leader).paxosservice(osdmap 1252..1873) _active
> creating new pending
>
> during the time when the monitor process is pegged at 100%.
>
> The problem started when one of the hosts running a peon was rebooted,
> but didn't have the correct mtu setting in /etc/network/interfaces.
> The problem showed up after correcting the mtu value.
>
> Also, we are using a hyperconverged architecture where the same host
> runs a monitor and multiple OSDs.
>
> Any thoughts on recovery would be greatly appreciated.

      What version is this?

How many monitors are you running?

Are the monitors consuming an unusual amount of memory? What about the 
OSDs in the same nodes?

Is the size of the monitor stores abnormally high?

Have you tried restarting all monitors and see if they hit the same issue?

   -Joao

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com