Hi All,
I would like to understand how Ceph handles and recovers from bad
blocks. Would someone mind explaining this to me? It wasn't very
apparent from the docs.
My ultimate goal to be able to get some extra life out of my disks,
after I detect that they may be failing. (I'm talking about those disks
that may have a small amount of bad blocks, but otherwise seem file and
still perform well).
Here's what I've put together:
1. BBR Hardware
- All hard disks come with a set number of blocks that are reserved
for remapping of failed blocks. This is handled transparently by the
hard disk. The hard disk may not begin reporting failed blocks until
all the reserved blocks are used up.
2. BBR Device Mapper Target
- Back in the EVMS days, IBM wrote a kernel module (dm-bbr) and a
evms plugin to manage that kernel module. I have updated that kernel
module to work with the 3.6.11 kernel. I have also rewrote some
portions of the evms plugin as a standalone bash script to allow me to
initialize the BBR layer and start the BBR device mapper target on that
layer. (So far it seems to run fine, but requires more testing).
3. BTRFS
- I've read that BTRFS can perform data scrubbing and repair
damaged files from redundant copies.
4. CEPH
- I've read that CEPH can perform a deep scrub to find damaged
copies. I assume by the distributed nature of CEPH, it can repair the
damaged copy from the other OSDs.
One thing I am not clear on is when BTRFS / CEPH finds damaged data,
what do they do to prevent data from being written to the same area?
Also, I'm wondering if any parts to my layered approach are redundant /
unnecessary... For instance if BTRFS marks the block bad internally,
then perhaps the BBR DM Target isn't needed...
In my testing recently, I had the following setup:
Disk -> DM-Crypt -> DM-BBR -> BTRFS -> OSD
When the OSD hit a bad block, the DM-BBR target successfully remapped
it to one of its own reserved blocks, BTRFS then reported data
corruption, and the OSD daemon crashed.
--
Thanks,
Dyweni
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html