Re: [PATCH 00/18] Assorted md patches headed for 2.6.30

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Message from neilb@xxxxxxx ---------
    Date: Mon, 16 Feb 2009 16:35:52 +1100
    From: Neil Brown <neilb@xxxxxxx>
 Subject: Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
      To: Bill Davidsen <davidsen@xxxxxxx>
Cc: Julian Cowley <julian@xxxxxxxx>, Keld Jorn Simonsen <keld@xxxxxxxx>, linux-raid@xxxxxxxxxxxxxxx

Ob. plug for raid5E: the advantages of raid5E are two-fold. The most
obvious is that head motion is spread over N+2 drives (N being number of
data drives) which improves performance quite a bit in the common small
business case of 4-5 drive setups. It also puts some use on each drive,
so you don't suddenly start using a drive which may have been spun down
for a month, may have developed issues since SMART was last run, etc.


Are you thinking of raid5e, where all the spare space is at the end of
the devices, or raid5ee where it is more evenly distributed?

raid5E I'd say.

So raid5e is just a normal raid5 where you don't use all of the space.
When a failure happens, you reshape to n-1 drives, thus absorbing the
space.

raid5ee is much like raid6, but you don't read or write the Q block.
If you lose a drive, you rebuild it in the space were the Q block
lives.

So would you just use raid6 normally and transition to a contorted
raid5 on device failure?  Or would you really want to leave those
blocks fallow?

My understanding is that 5EE leaves those blocks empty. Doing real Q blocks would entail too much overhead but it reminds of an idea I had some time ago. I call it lazy-Raid6 ;)

Problem: You have enough disks to run RAID6 but you don't want to pay the performance penalty* of RAID6. The solution in those cases is usually RAID5+hotspare but maybe we can do better. We could also use the hotspare to store the RAID6 polynom but we have to calculate this (or more specifically read/write the stripe/block) only when the disks are idle. This of course means that the hotspare will have a number of invalid blocks after each write operation but the majority of blocks will be up-to-date. (use a bitmap to mark dirty blocks and "clean up" when the disks are idle) The goal behind this is to have basically the same performance as with normal RAID5 but a higher failure resilience. In my experience harddisks often fail partially so that if you have a partial and a complete disk failure, chances are you will be able to recover. Even when two disks fail completely the number of dirty blocks should usually be pretty low so we would be able recover most of the data. If there is a single disk failure we behave like a normal raid5+(hot)spare of course. It is not intended as a replacement for normal RAID6 but it would give most of your data about the same protection while maintaining the speed of RAID5.

*) The main speed advantage of RAID5 vs. RAID6 comes from the fact that if you write one physical block**) in a RAID5 you only need to update***) one other additional physical block. If you write a physical block in a RAID6 you have to read the whole stripe and then write the RAID6 chunk of the stripe. **) A RAID chunk consists of several physical blocks. Several chunks make up a stripe.
***) read+write

Ok, I hope no one can claim a patent on it now. ;)
Alex.

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@xxxxxxxxxxx \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux