Re: Ceph Bug #2563

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 9 Oct 2012 12:59:05 -0700

Okay, thanks for the information.

Sam has walked through trying to fix this before and I don't know if
he came up with anything, but the use of btrfs compression has been a
common theme among those who have reproduced this bug. I updated the
ticket, but for now I'd recommend leaving it off with the rest of your
machines.

John, can you add a warning to whatever install/configuration/whatever
docs are appropriate?
-Greg

On Tue, Oct 9, 2012 at 12:50 PM, Dave (Bob) <dave@xxxxxxxxxxxxxxxxxx> wrote:
> Greg,
>
> Thank you very much for your prompt rely.
>
> Yes, I am using lzo compression, and autodefrag.
>
> David
>
>
> On 09/10/2012 20:45, Gregory Farnum wrote:
>> I'm going to have to leave most of these questions for somebody else,
>> but I do have one question. Are you using btrfs compression on your
>> OSD backing filesystems?
>> -Greg
>>
>> On Tue, Oct 9, 2012 at 12:43 PM, Dave (Bob) <dave@xxxxxxxxxxxxxxxxxx> wrote:
>>> I have a problem with this leveldb corruption issue. My logs show the
>>> same failure as is shown in Ceph's redmine as bug #2563.
>>>
>>> I am using linux-3.6.0 (x86_64) and ceph-0.52.
>>>
>>> I am using btrfs on my 4 osd's. Each osd is using a partition on a disk drive,
>>> there are 4 disk drives, all on the same machine.
>>>
>>> Each of these osd partitions is the bulk of the disk. There are also
>>> partitions that provide for booting and a root filesystem from which
>>> linux runs.
>>>
>>> The mon and mds are running on the same machine.
>>>
>>> I have been tracking Ceph releases for about a year, this is my ceph
>>> test machine.
>>>
>>> Ceph clearly hammers the disk system; btrfs; and linux. Things have
>>> moved so far over the past six months, from a time when things would
>>> crash horribly in a short time to the point where it almost works.
>>>
>>> I have had a lot of trouble with the 'slow response' messages associated
>>> with the osd's, but linux-3.6.0 seems to have brought about improvements
>>> in btrfs that are noticeable. I am also tuning the
>>> 'dirty_background_ratio' and I think that this will help.
>>>
>>> With my current configuration, I can leave ceph and my osds churning
>>> data for days on end, and the only errors that I get are the leveldb
>>> 'std::__throw_length_error' pattern. The osd's go down and can't be
>>> brought back up.
>>>
>>> I have compiled the 'check.cc' program that I found following the bug
>>> #2563 links. I copy the omap directory from my broken osd (current or
>>> snaps) and run the check on it and get:
>>>
>>> terminate called after throwing an instance of 'std::length_error'
>>>
>>> In the past, I've had only one osd at a time go down in this way, and
>>> I've re-created a btrfs filesystem and allowed ceph to regenerate. Now I
>>> have been working with only 3 osds and two have gone down
>>> simultaneously. I've been amazed at ceph's ability to repair itself, but
>>> I think that this is not going to be recoverable.
>>>
>>> On the ceph redmine, it says:
>>>
>>>   * *Status* changed from /New/ to /Can't reproduce/
>>>
>>> I can reproduce this time and time again. From my perspective it looks
>>> like the final block to my being confident that all I have to do is
>>> optimise my hardware and configuration to make things faster.
>>>
>>> What can we do to fix this problem?
>>>
>>> Is there anything that I can do to recover my broken osd's without
>>> recreating them afresh and loosing the data?
>>>
>>> David Humphreys
>>> Datatone Ltd
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html