Ceph Bug #2563

"Dave (Bob)" <dave@xxxxxxxxxxxxxxxxxx> · Tue, 09 Oct 2012 20:43:03 +0100

I have a problem with this leveldb corruption issue. My logs show the
same failure as is shown in Ceph's redmine as bug #2563.

I am using linux-3.6.0 (x86_64) and ceph-0.52.

I am using btrfs on my 4 osd's. Each osd is using a partition on a disk drive,
there are 4 disk drives, all on the same machine.

Each of these osd partitions is the bulk of the disk. There are also
partitions that provide for booting and a root filesystem from which
linux runs.

The mon and mds are running on the same machine.

I have been tracking Ceph releases for about a year, this is my ceph
test machine.

Ceph clearly hammers the disk system; btrfs; and linux. Things have
moved so far over the past six months, from a time when things would
crash horribly in a short time to the point where it almost works.

I have had a lot of trouble with the 'slow response' messages associated
with the osd's, but linux-3.6.0 seems to have brought about improvements
in btrfs that are noticeable. I am also tuning the
'dirty_background_ratio' and I think that this will help.

With my current configuration, I can leave ceph and my osds churning
data for days on end, and the only errors that I get are the leveldb
'std::__throw_length_error' pattern. The osd's go down and can't be
brought back up.

I have compiled the 'check.cc' program that I found following the bug
#2563 links. I copy the omap directory from my broken osd (current or
snaps) and run the check on it and get:

terminate called after throwing an instance of 'std::length_error'

In the past, I've had only one osd at a time go down in this way, and
I've re-created a btrfs filesystem and allowed ceph to regenerate. Now I
have been working with only 3 osds and two have gone down
simultaneously. I've been amazed at ceph's ability to repair itself, but
I think that this is not going to be recoverable.

On the ceph redmine, it says:

  * *Status* changed from /New/ to /Can't reproduce/

I can reproduce this time and time again. From my perspective it looks
like the final block to my being confident that all I have to do is
optimise my hardware and configuration to make things faster.

What can we do to fix this problem?

Is there anything that I can do to recover my broken osd's without
recreating them afresh and loosing the data?

David Humphreys
Datatone Ltd

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html