iostat and dashboard freezing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear All,

We have a new Nautilus (14.2.2) cluster, with 328 OSDs spread over 40 nodes.

Unfortunately "ceph iostat" spends most of it's time frozen, with
occasional periods of working normally for less than a minute, then
freeze again for a couple of minutes, then come back to life, and so so
on...

No errors are seen on screen, unless I press CTRL+C when iostat is stalled:

[root@ceph-s3 ~]# ceph iostat
^CInterrupted
Traceback (most recent call last):
  File "/usr/bin/ceph", line 1263, in <module>
    retval = main()
  File "/usr/bin/ceph", line 1194, in main
    verbose)
  File "/usr/bin/ceph", line 619, in new_style_command
    ret, outbuf, outs = do_command(parsed_args, target, cmdargs,
sigdict, inbuf, verbose)
  File "/usr/bin/ceph", line 593, in do_command
    return ret, '', ''
UnboundLocalError: local variable 'ret' referenced before assignment

Observations:

1) This problem does not seem to be related to load on the cluster.

2) When iostat is stalled the dashboard is also non-responsive, if
iostat is working, the dashboard also works.

Presumably the iostat and dashboard problems are due to the same
underlying fault? Perhaps a problem with the mgr?


3) With iostat working, tailing /var/log/ceph/ceph-mgr.ceph-s3.log
shows:

2019-08-27 09:09:56.817 7f8149834700  0 log_channel(audit) log [DBG] :
from='client.4120202 -' entity='client.admin' cmd=[{"width": 95,
"prefix": "iostat", "poll": true, "target": ["mgr", ""], "print_header":
false}]: dispatch

4) When iostat isn't working, we see no obvious errors in the mgr log.

5) When the dashboard is not working, mgr log sometimes shows:

2019-08-27 09:18:18.810 7f813e533700  0 mgr[dashboard]
[::ffff:10.91.192.36:43606] [GET] [500] [2.724s] [jake] [1.6K]
/api/health/minimal
2019-08-27 09:18:18.887 7f813e533700  0 mgr[dashboard] ['{"status": "500
Internal Server Error", "version": "3.2.2", "detail": "The server
encountered an unexpected condition which prevented it from fulfilling
the request.", "traceback": "Traceback (most recent call last):\\n  File
\\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\", line 656,
in respond\\n    response.body = self.handler()\\n  File
\\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line
188, in __call__\\n    self.body = self.oldhandler(*args, **kwargs)\\n
File \\"/usr/lib/python2.7/site-packages/cherrypy/_cptools.py\\", line
221, in wrap\\n    return self.newhandler(innerfunc, *args, **kwargs)\\n
 File \\"/usr/share/ceph/mgr/dashboard/services/exception.py\\", line
88, in dashboard_exception_handler\\n    return handler(*args,
**kwargs)\\n  File
\\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\", line 34,
in __call__\\n    return self.callable(*self.args, **self.kwargs)\\n
File \\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line
649, in inner\\n    ret = func(*args, **kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/health.py\\", line 192, in
minimal\\n    return self.health_minimal.all_health()\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/health.py\\", line 51, in
all_health\\n    result[\'pools\'] = self.pools()\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/health.py\\", line 167, in
pools\\n    pools = CephService.get_pool_list_with_stats()\\n  File
\\"/usr/share/ceph/mgr/dashboard/services/ceph_service.py\\", line 124,
in get_pool_list_with_stats\\n    \'series\': [i for i in
stat_series]\\nRuntimeError: deque mutated during iteration\\n"}']


6) IPV6 is normally disabled on our machines at the kernel level, via
grubby --update-kernel=ALL --args="ipv6.disable=1"

This was done as 'disabling ipv6' interfered with the dashboard (giving
"HEALTH_ERR Module 'dashboard' has failed: error('No socket could be
created',) we re-enabling ipv6 on the mgr nodes only to fix this.


Ideas...?

Should ipv6 be enabled, even if not configured, on all ceph nodes?

Any ideas on fixing this gratefully received!

many thanks

Jake

-- 
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux