It's definitely ceph-mgr that is struggling here. It uses 100% of a cpu for for several tens of seconds and reports the followinf in its log a few times before anything gets displayed Traceback (most recent call last): File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 88, in dashboard_exception_handler return handler(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__ return self.callable(*self.args, **self.kwargs) File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 649, in inner ret = func(*args, **kwargs) File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 842, in wrapper return func(*vpath, **params) File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in wrapper return f(*args, **kwargs) File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in wrapper return f(*args, **kwargs) File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 270, in list return self._rbd_list(pool_name) File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 261, in _rbd_list status, value = self._rbd_pool_list(pool) File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 244, in wrapper return rvc.run(fn, args, kwargs) File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 232, in run raise ViewCacheNoDataException() ViewCacheNoDataException: ViewCache: unable to retrieve data ----- On 5 Apr, 2019, at 5:06 PM, Wes Cilldhaire wes@xxxxxxxxxxx wrote: > Hi Lenz, > > Thanks for responding. I suspected that the number of rbd images might have had > something to do with it so I cleaned up old disposable VM images I am no longer > using, taking the list down from ~30 to 16, 2 in the EC pool on hdds and the > rest on the replicated ssd pool. They vary in size from 50GB to 200GB, I don't > have the # of objects per rbd on hand right now but maybe this is a factor as > well, particularly with 'du'. This doesn't appear to have made a difference in > the time and number of attempts required to list them in the dashboard. > > I suspect it might be a case of 'du on all images is always going to take longer > than the current dashboard timeout', in which case the behaviour of the > dashboard might possibly need to change to account for this, maybe fetch and > listt the images in parallel and asynchronously or something. As it stand it > means the dashboard isn't really usable for managing existing images, which is > a shame because having that ability makes ceph accessible to our clients who > are considering it and begins affording some level of self-service for them - > one of the reasons we've been really excited for Mimic's release actually. I > really hope I've just done something wrong :) > > I'll try to isolate which process the delay is coming from tonight as well as > collecting other useful metrics when I'm back on that network tonight. > > Thanks, > Wes > > (null) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com