ceph-mgr: failed to retrieve mon information and exception shows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We setup a 3 node cluster (v13.2.4) with 3 MON running and encountered
a strange issue that mon metadata cannot not be retrieved from the
restful API mon endpoint but the "ceph mon metadata" command shows
correctly.  Under such condition, a exception shows as below when
accessing the mon endpoint.

<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete
your request.  Either the server is overloaded or there is an error in
the application.</p>

2019-07-30 17:28:22.539 7fd52a2ef700 -1 mgr get_metadata_python
Requested missing service mon.Host3
2019-07-30 17:28:22.550 7fd52a2ef700  0 mgr[restful] Traceback (most
recent call last):
  File "/usr/lib/python2.7/site-packages/pecan/core.py", line 570, in __call__
    self.handle_request(req, resp)
  File "/usr/lib/python2.7/site-packages/pecan/core.py", line 508, in
handle_request
    result = controller(*args, **kwargs)
  File "/usr/lib64/ceph/mgr/restful/decorators.py", line 35, in decorated
    return f(*args, **kwargs)
  File "/usr/lib64/ceph/mgr/restful/api/mon.py", line 39, in get
    return context.instance.get_mons()
  File "/usr/lib64/ceph/mgr/restful/module.py", line 500, in get_mons
    mon['server'] = self.get_metadata("mon", mon['name'])['hostname']
TypeError: 'NoneType' object has no attribute '__getitem__'


Also, lots of "unhandled message" spams the syslog of active MGR (on Host2):

2019-07-30 17:28:22.936 7fd52caf4700  0 ms_deliver_dispatch: unhandled
message 0x55e44f783800 mgrreport(mon.Host3 +53-0 packed 798) v6 from
mon.? 192.168.2.118:0/612
2019-07-30 17:28:23.447 7fd52caf4700  0 ms_deliver_dispatch: unhandled
message 0x55e44f54d200 mgrreport(mon.Host1 +53-0 packed 798) v6 from
mon.2 192.168.2.202:0/1967658


After raising the debug_mgr from 1/5 to 20, it seems that the there is
no MON metadata recorded in the DaemonState so that any further update
are ignored.

2019-07-30 18:04:50.784 7fd52caf4700  4 mgr.server handle_open from
0x55e44de6aa00  mon,Host1
2019-07-30 18:04:50.786 7fd52caf4700  4 mgr.server handle_report from
0x55e44de6aa00 mon,Host1
2019-07-30 18:04:50.786 7fd52caf4700  5 mgr.server handle_report
rejecting report from mon,Host1, since we do not have its metadata
now.
2019-07-30 18:04:50.786 7fd52caf4700 10 mgr.server handle_report
unregistering osd.-1  session 0x55e44d219ba0 con 0x55e44de6aa00
2019-07-30 18:04:50.786 7fd52caf4700  0 ms_deliver_dispatch: unhandled
message 0x55e4509b4600 mgrreport(mon.Host1 +53-0 packed 798) v6 from
mon.2 192.168.2.202:0/1967658

I tried to restart the MON and MGR on Host1 but the "unhandled
message" logs still keep showning on the active MGR.  Is there any
idea to fix or to remove the "unhandled message"?  Is it related to
the inconsistent mon metadata issue?  Thanks.

- Jerry



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux