The OSDs expect the underlying filesystem to keep their data clean and fail-crash in order to prevent accidentally introducing corruption into the system. There's some ongoing work to make that a little friendlier, but it's not done yet. -Greg On Wed, Mar 20, 2013 at 11:55 AM, Dyweni - Ceph-Devel <YS3fpFE2ykfB@xxxxxxxxxx> wrote: > Hi All, > > I would like to understand how Ceph handles and recovers from bad blocks. > Would someone mind explaining this to me? It wasn't very apparent from the > docs. > > My ultimate goal to be able to get some extra life out of my disks, after I > detect that they may be failing. (I'm talking about those disks that may > have a small amount of bad blocks, but otherwise seem file and still perform > well). > > Here's what I've put together: > > 1. BBR Hardware > - All hard disks come with a set number of blocks that are reserved for > remapping of failed blocks. This is handled transparently by the hard disk. > The hard disk may not begin reporting failed blocks until all the reserved > blocks are used up. > > 2. BBR Device Mapper Target > - Back in the EVMS days, IBM wrote a kernel module (dm-bbr) and a evms > plugin to manage that kernel module. I have updated that kernel module to > work with the 3.6.11 kernel. I have also rewrote some portions of the evms > plugin as a standalone bash script to allow me to initialize the BBR layer > and start the BBR device mapper target on that layer. (So far it seems to > run fine, but requires more testing). > > 3. BTRFS > - I've read that BTRFS can perform data scrubbing and repair damaged > files from redundant copies. > > 4. CEPH > - I've read that CEPH can perform a deep scrub to find damaged copies. > I assume by the distributed nature of CEPH, it can repair the damaged copy > from the other OSDs. > > One thing I am not clear on is when BTRFS / CEPH finds damaged data, what do > they do to prevent data from being written to the same area? > > Also, I'm wondering if any parts to my layered approach are redundant / > unnecessary... For instance if BTRFS marks the block bad internally, then > perhaps the BBR DM Target isn't needed... > > > In my testing recently, I had the following setup: > Disk -> DM-Crypt -> DM-BBR -> BTRFS -> OSD > > When the OSD hit a bad block, the DM-BBR target successfully remapped it to > one of its own reserved blocks, BTRFS then reported data corruption, and the > OSD daemon crashed. > > > -- > Thanks, > Dyweni > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html