Unfixable corruption in ceph cluster

Daniel Poelzleithner <poelzleithner@xxxxxxxxxxxxx> · Fri, 07 Feb 2014 22:19:33 +0100

Hi,

we experience a strange corruption in the ceph cluster that makes it
impossible to restart all nodes of it. Always one node crashes when some
pg gets replicated.
As much as I understood the admin, if the node is cleared completely,
the node synces, but some other node crashes then.

I think there was a similar bug
http://tracker.ceph.com/issues/6101#note-7 already filed.

Removing the rados block did not fix the problem.

In my opinion the bug is severe, as it shows that some internal
corruption seems to be triggered by network failure and causes a
permanent unfixable broken cluster.

Could someone please take a look at it ?
I will try to provide additional information when required.

kind regards
 Daniel
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html