Hi, We're using ceph v0.55, and last night we loste one node of our cluster. When it came back, ceph start recovering but since then the radosgateway could not connect to the cluster. The rados gateway timeout on initializtion (somewhere in the radosclient connect). The other problem (and I think it's related) is that the recovery isn't working. Osd gets OSD Op thread timeout and sometimes some of the OSD crash (see stacktrace attached). So it seems that our OSD aren't up long enough for the recovery to proceed. Any would be appreciated. Thanks, -- Yann
Attachment:
ceph.log
Description: ceph.log