Bad Blocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I would like to understand how Ceph handles and recovers from bad blocks. Would someone mind explaining this to me? It wasn't very apparent from the docs.

My ultimate goal to be able to get some extra life out of my disks, after I detect that they may be failing. (I'm talking about those disks that may have a small amount of bad blocks, but otherwise seem file and still perform well).

Here's what I've put together:

1. BBR Hardware
- All hard disks come with a set number of blocks that are reserved for remapping of failed blocks. This is handled transparently by the hard disk. The hard disk may not begin reporting failed blocks until all the reserved blocks are used up.

2. BBR Device Mapper Target
- Back in the EVMS days, IBM wrote a kernel module (dm-bbr) and a evms plugin to manage that kernel module. I have updated that kernel module to work with the 3.6.11 kernel. I have also rewrote some portions of the evms plugin as a standalone bash script to allow me to initialize the BBR layer and start the BBR device mapper target on that layer. (So far it seems to run fine, but requires more testing).

3. BTRFS
- I've read that BTRFS can perform data scrubbing and repair damaged files from redundant copies.

4. CEPH
- I've read that CEPH can perform a deep scrub to find damaged copies. I assume by the distributed nature of CEPH, it can repair the damaged copy from the other OSDs.

One thing I am not clear on is when BTRFS / CEPH finds damaged data, what do they do to prevent data from being written to the same area?

Also, I'm wondering if any parts to my layered approach are redundant / unnecessary... For instance if BTRFS marks the block bad internally, then perhaps the BBR DM Target isn't needed...


In my testing recently, I had the following setup:
  Disk -> DM-Crypt -> DM-BBR -> BTRFS -> OSD

When the OSD hit a bad block, the DM-BBR target successfully remapped it to one of its own reserved blocks, BTRFS then reported data corruption, and the OSD daemon crashed.


--
Thanks,
Dyweni
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux