Re: ceph-mgr: failed to retrieve mon information and exception shows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

It turns out to be a known issue which has been fixed since Ceph
v13.2.5, https://tracker.ceph.com/issues/38109.
We will cherry-pick it and check whether it works or not in our
environment, thanks :)

- Jerry

On Tue, 30 Jul 2019 at 18:14, Jerry Lee <leisurelysw24@xxxxxxxxx> wrote:
>
> Hi,
>
> We setup a 3 node cluster (v13.2.4) with 3 MON running and encountered
> a strange issue that mon metadata cannot not be retrieved from the
> restful API mon endpoint but the "ceph mon metadata" command shows
> correctly.  Under such condition, a exception shows as below when
> accessing the mon endpoint.
>
> <title>500 Internal Server Error</title>
> <h1>Internal Server Error</h1>
> <p>The server encountered an internal error and was unable to complete
> your request.  Either the server is overloaded or there is an error in
> the application.</p>
>
> 2019-07-30 17:28:22.539 7fd52a2ef700 -1 mgr get_metadata_python
> Requested missing service mon.Host3
> 2019-07-30 17:28:22.550 7fd52a2ef700  0 mgr[restful] Traceback (most
> recent call last):
>   File "/usr/lib/python2.7/site-packages/pecan/core.py", line 570, in __call__
>     self.handle_request(req, resp)
>   File "/usr/lib/python2.7/site-packages/pecan/core.py", line 508, in
> handle_request
>     result = controller(*args, **kwargs)
>   File "/usr/lib64/ceph/mgr/restful/decorators.py", line 35, in decorated
>     return f(*args, **kwargs)
>   File "/usr/lib64/ceph/mgr/restful/api/mon.py", line 39, in get
>     return context.instance.get_mons()
>   File "/usr/lib64/ceph/mgr/restful/module.py", line 500, in get_mons
>     mon['server'] = self.get_metadata("mon", mon['name'])['hostname']
> TypeError: 'NoneType' object has no attribute '__getitem__'
>
>
> Also, lots of "unhandled message" spams the syslog of active MGR (on Host2):
>
> 2019-07-30 17:28:22.936 7fd52caf4700  0 ms_deliver_dispatch: unhandled
> message 0x55e44f783800 mgrreport(mon.Host3 +53-0 packed 798) v6 from
> mon.? 192.168.2.118:0/612
> 2019-07-30 17:28:23.447 7fd52caf4700  0 ms_deliver_dispatch: unhandled
> message 0x55e44f54d200 mgrreport(mon.Host1 +53-0 packed 798) v6 from
> mon.2 192.168.2.202:0/1967658
>
>
> After raising the debug_mgr from 1/5 to 20, it seems that the there is
> no MON metadata recorded in the DaemonState so that any further update
> are ignored.
>
> 2019-07-30 18:04:50.784 7fd52caf4700  4 mgr.server handle_open from
> 0x55e44de6aa00  mon,Host1
> 2019-07-30 18:04:50.786 7fd52caf4700  4 mgr.server handle_report from
> 0x55e44de6aa00 mon,Host1
> 2019-07-30 18:04:50.786 7fd52caf4700  5 mgr.server handle_report
> rejecting report from mon,Host1, since we do not have its metadata
> now.
> 2019-07-30 18:04:50.786 7fd52caf4700 10 mgr.server handle_report
> unregistering osd.-1  session 0x55e44d219ba0 con 0x55e44de6aa00
> 2019-07-30 18:04:50.786 7fd52caf4700  0 ms_deliver_dispatch: unhandled
> message 0x55e4509b4600 mgrreport(mon.Host1 +53-0 packed 798) v6 from
> mon.2 192.168.2.202:0/1967658
>
> I tried to restart the MON and MGR on Host1 but the "unhandled
> message" logs still keep showning on the active MGR.  Is there any
> idea to fix or to remove the "unhandled message"?  Is it related to
> the inconsistent mon metadata issue?  Thanks.
>
> - Jerry



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux