Re: ceph crash hangs forever and recovery stop

Paul Emmerich <paul.emmerich@xxxxxxxx> · Thu, 30 Apr 2020 17:14:18 +0200

Best guess: the recovery process doesn't really stop, but it's just that
the mgr is dead and it no longer reports the progress

And yeah, I can confirm that having a huge number of crash reports is a
problem (had a case where a monitoring script crashed due to a
radosgw-admin bug... lots of crash reports)

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Apr 30, 2020 at 4:09 PM Francois Legrand <fleg@xxxxxxxxxxxxxx>
wrote:

> Hi everybody (again),
> We recently had a lot of osd crashs (more than 30 osd crashed). This is
> now fixed, but it triggered a huge rebalancing+recovery.
> More or less in the same time, we noticed that the ceph crash ls (or
> whatever other ceph crash command) hangs forever and never returns.
> And finally, the recovery process stops regularly (after ~1 hour) but it
> can be restarted by reseting the mgr daemon (systemctl restart
> ceph-mgr.target on the active manager).
> There is nothing in the logs (the manager still works, the service is
> up, the dashboard is accessible but simply the recovery stops).
> We also tryed to reboot the managers, but it doesn't solve the problem.
> I guess theses two problems should be linked, but not sure.
> Does anybody have a clue ?
> Thanks.
> F.
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx