Dear All, We have a new Nautilus (14.2.2) cluster, with 328 OSDs spread over 40 nodes. Unfortunately "ceph iostat" spends most of it's time frozen, with occasional periods of working normally for less than a minute, then freeze again for a couple of minutes, then come back to life, and so so on... No errors are seen on screen, unless I press CTRL+C when iostat is stalled: [root@ceph-s3 ~]# ceph iostat ^CInterrupted Traceback (most recent call last): File "/usr/bin/ceph", line 1263, in <module> retval = main() File "/usr/bin/ceph", line 1194, in main verbose) File "/usr/bin/ceph", line 619, in new_style_command ret, outbuf, outs = do_command(parsed_args, target, cmdargs, sigdict, inbuf, verbose) File "/usr/bin/ceph", line 593, in do_command return ret, '', '' UnboundLocalError: local variable 'ret' referenced before assignment Observations: 1) This problem does not seem to be related to load on the cluster. 2) When iostat is stalled the dashboard is also non-responsive, if iostat is working, the dashboard also works. Presumably the iostat and dashboard problems are due to the same underlying fault? Perhaps a problem with the mgr? 3) With iostat working, tailing /var/log/ceph/ceph-mgr.ceph-s3.log shows: 2019-08-27 09:09:56.817 7f8149834700 0 log_channel(audit) log [DBG] : from='client.4120202 -' entity='client.admin' cmd=[{"width": 95, "prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header": false}]: dispatch 4) When iostat isn't working, we see no obvious errors in the mgr log. 5) When the dashboard is not working, mgr log sometimes shows: 2019-08-27 09:18:18.810 7f813e533700 0 mgr[dashboard] [::ffff:10.91.192.36:43606] [GET] [500] [2.724s] [jake] [1.6K] /api/health/minimal 2019-08-27 09:18:18.887 7f813e533700 0 mgr[dashboard] ['{"status": "500 Internal Server Error", "version": "3.2.2", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "traceback": "Traceback (most recent call last):\\n File \\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\", line 656, in respond\\n response.body = self.handler()\\n File \\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line 188, in __call__\\n self.body = self.oldhandler(*args, **kwargs)\\n File \\"/usr/lib/python2.7/site-packages/cherrypy/_cptools.py\\", line 221, in wrap\\n return self.newhandler(innerfunc, *args, **kwargs)\\n File \\"/usr/share/ceph/mgr/dashboard/services/exception.py\\", line 88, in dashboard_exception_handler\\n return handler(*args, **kwargs)\\n File \\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\", line 34, in __call__\\n return self.callable(*self.args, **self.kwargs)\\n File \\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line 649, in inner\\n ret = func(*args, **kwargs)\\n File \\"/usr/share/ceph/mgr/dashboard/controllers/health.py\\", line 192, in minimal\\n return self.health_minimal.all_health()\\n File \\"/usr/share/ceph/mgr/dashboard/controllers/health.py\\", line 51, in all_health\\n result[\'pools\'] = self.pools()\\n File \\"/usr/share/ceph/mgr/dashboard/controllers/health.py\\", line 167, in pools\\n pools = CephService.get_pool_list_with_stats()\\n File \\"/usr/share/ceph/mgr/dashboard/services/ceph_service.py\\", line 124, in get_pool_list_with_stats\\n \'series\': [i for i in stat_series]\\nRuntimeError: deque mutated during iteration\\n"}'] 6) IPV6 is normally disabled on our machines at the kernel level, via grubby --update-kernel=ALL --args="ipv6.disable=1" This was done as 'disabling ipv6' interfered with the dashboard (giving "HEALTH_ERR Module 'dashboard' has failed: error('No socket could be created',) we re-enabling ipv6 on the mgr nodes only to fix this. Ideas...? Should ipv6 be enabled, even if not configured, on all ceph nodes? Any ideas on fixing this gratefully received! many thanks Jake -- MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com