Hi, It turns out to be a known issue which has been fixed since Ceph v13.2.5, https://tracker.ceph.com/issues/38109. We will cherry-pick it and check whether it works or not in our environment, thanks :) - Jerry On Tue, 30 Jul 2019 at 18:14, Jerry Lee <leisurelysw24@xxxxxxxxx> wrote: > > Hi, > > We setup a 3 node cluster (v13.2.4) with 3 MON running and encountered > a strange issue that mon metadata cannot not be retrieved from the > restful API mon endpoint but the "ceph mon metadata" command shows > correctly. Under such condition, a exception shows as below when > accessing the mon endpoint. > > <title>500 Internal Server Error</title> > <h1>Internal Server Error</h1> > <p>The server encountered an internal error and was unable to complete > your request. Either the server is overloaded or there is an error in > the application.</p> > > 2019-07-30 17:28:22.539 7fd52a2ef700 -1 mgr get_metadata_python > Requested missing service mon.Host3 > 2019-07-30 17:28:22.550 7fd52a2ef700 0 mgr[restful] Traceback (most > recent call last): > File "/usr/lib/python2.7/site-packages/pecan/core.py", line 570, in __call__ > self.handle_request(req, resp) > File "/usr/lib/python2.7/site-packages/pecan/core.py", line 508, in > handle_request > result = controller(*args, **kwargs) > File "/usr/lib64/ceph/mgr/restful/decorators.py", line 35, in decorated > return f(*args, **kwargs) > File "/usr/lib64/ceph/mgr/restful/api/mon.py", line 39, in get > return context.instance.get_mons() > File "/usr/lib64/ceph/mgr/restful/module.py", line 500, in get_mons > mon['server'] = self.get_metadata("mon", mon['name'])['hostname'] > TypeError: 'NoneType' object has no attribute '__getitem__' > > > Also, lots of "unhandled message" spams the syslog of active MGR (on Host2): > > 2019-07-30 17:28:22.936 7fd52caf4700 0 ms_deliver_dispatch: unhandled > message 0x55e44f783800 mgrreport(mon.Host3 +53-0 packed 798) v6 from > mon.? 192.168.2.118:0/612 > 2019-07-30 17:28:23.447 7fd52caf4700 0 ms_deliver_dispatch: unhandled > message 0x55e44f54d200 mgrreport(mon.Host1 +53-0 packed 798) v6 from > mon.2 192.168.2.202:0/1967658 > > > After raising the debug_mgr from 1/5 to 20, it seems that the there is > no MON metadata recorded in the DaemonState so that any further update > are ignored. > > 2019-07-30 18:04:50.784 7fd52caf4700 4 mgr.server handle_open from > 0x55e44de6aa00 mon,Host1 > 2019-07-30 18:04:50.786 7fd52caf4700 4 mgr.server handle_report from > 0x55e44de6aa00 mon,Host1 > 2019-07-30 18:04:50.786 7fd52caf4700 5 mgr.server handle_report > rejecting report from mon,Host1, since we do not have its metadata > now. > 2019-07-30 18:04:50.786 7fd52caf4700 10 mgr.server handle_report > unregistering osd.-1 session 0x55e44d219ba0 con 0x55e44de6aa00 > 2019-07-30 18:04:50.786 7fd52caf4700 0 ms_deliver_dispatch: unhandled > message 0x55e4509b4600 mgrreport(mon.Host1 +53-0 packed 798) v6 from > mon.2 192.168.2.202:0/1967658 > > I tried to restart the MON and MGR on Host1 but the "unhandled > message" logs still keep showning on the active MGR. Is there any > idea to fix or to remove the "unhandled message"? Is it related to > the inconsistent mon metadata issue? Thanks. > > - Jerry