I'm going to have to leave most of these questions for somebody else, but I do have one question. Are you using btrfs compression on your OSD backing filesystems? -Greg On Tue, Oct 9, 2012 at 12:43 PM, Dave (Bob) <dave@xxxxxxxxxxxxxxxxxx> wrote: > I have a problem with this leveldb corruption issue. My logs show the > same failure as is shown in Ceph's redmine as bug #2563. > > I am using linux-3.6.0 (x86_64) and ceph-0.52. > > I am using btrfs on my 4 osd's. Each osd is using a partition on a disk drive, > there are 4 disk drives, all on the same machine. > > Each of these osd partitions is the bulk of the disk. There are also > partitions that provide for booting and a root filesystem from which > linux runs. > > The mon and mds are running on the same machine. > > I have been tracking Ceph releases for about a year, this is my ceph > test machine. > > Ceph clearly hammers the disk system; btrfs; and linux. Things have > moved so far over the past six months, from a time when things would > crash horribly in a short time to the point where it almost works. > > I have had a lot of trouble with the 'slow response' messages associated > with the osd's, but linux-3.6.0 seems to have brought about improvements > in btrfs that are noticeable. I am also tuning the > 'dirty_background_ratio' and I think that this will help. > > With my current configuration, I can leave ceph and my osds churning > data for days on end, and the only errors that I get are the leveldb > 'std::__throw_length_error' pattern. The osd's go down and can't be > brought back up. > > I have compiled the 'check.cc' program that I found following the bug > #2563 links. I copy the omap directory from my broken osd (current or > snaps) and run the check on it and get: > > terminate called after throwing an instance of 'std::length_error' > > In the past, I've had only one osd at a time go down in this way, and > I've re-created a btrfs filesystem and allowed ceph to regenerate. Now I > have been working with only 3 osds and two have gone down > simultaneously. I've been amazed at ceph's ability to repair itself, but > I think that this is not going to be recoverable. > > On the ceph redmine, it says: > > * *Status* changed from /New/ to /Can't reproduce/ > > I can reproduce this time and time again. From my perspective it looks > like the final block to my being confident that all I have to do is > optimise my hardware and configuration to make things faster. > > What can we do to fix this problem? > > Is there anything that I can do to recover my broken osd's without > recreating them afresh and loosing the data? > > David Humphreys > Datatone Ltd > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html