Re: domino-style OSD crash

Tommi Virtanen <tv@xxxxxxxxxxx> · Mon, 9 Jul 2012 12:48:56 -0700



On Mon, Jul 9, 2012 at 12:05 PM, Yann Dupont <Yann.Dupont@xxxxxxxxxxxxxx> wrote:
>> The information here isn't enough to say whether the cause of the
>> corruption is btrfs or LevelDB, but the recovery needs to handled by
>> LevelDB -- and upstream is working on making it more robust:
>> http://code.google.com/p/leveldb/issues/detail?id=97
>
> Yes, saw this. It's very important. Sometimes, s... happens. In respect to
> the size ceph volumes can reach, having a tool to restart damaged nodes (for
> whatever reason) is a must.
>
> Thanks for the time you took to answer. It's much clearer for me now.

If it doesn't recover, you re-format the disk and thereby throw away
the contents. Not really all that different from handling hardware
failure. That's why we have replication.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html