Re: RFC - new raid superblock layout for md driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 03, 2002 at 08:38:25AM +1100, Neil Brown wrote:
1) Building a mirror is essentially just copying large amounts of data
   around, exactly what is needed to implement move functionality for
   arbitrarily remapping volumes.  (see
   http://people.sistina.com/~thornber/pvmove_outline.txt).
Building a mirror may be just moving data around. But the interesting
issues in raid1 are more about maintaining a mirror:  read balancing,
retry on error, hot spares, etc.
true, that's why LVM (dm) should use md for the raid work.

2) Extending raid 5 volumes becomes very simple if they are dm targets
   since you just add another segment, this new segment could even
   have different numbers of stripes.  eg,

But is this something that you would *want* to do???

To my mind, the raid1/raid5 almost always lives below any LVM or
partitioning scheme.  You use raid1/raid5 to combine drives (real,
physical drives) into virtual drives that are more reliable, and then
you partition them or whatever you want to do.  raid1 and raid5 on top
of LVM bits just doesn't make sense to me.
well to me does:
- you might want to split a mirror of a portion of data for backup
purposes (when snapshots won't do) or for safety before attempting a
risky operation.
- you might also want to have different raid strategies for different
data. Think a medium sized storage with oracle, you might want to do
a fast mirror for online redo logs(1) and raid5 for datafiles.(2)
- you might want to add mirroring after having put data on your disks
and the current way to do it with MD on partitions is complex, with
LVM over MD is really hard to do right.
- stackable devices are harder to maintain, a single interface to deal
with mirroring and volume management would be easier.
- we wont have any more problems with 'switching cache buffer size' :))))

(1) yes i know they are mirrored by oracle, but having a fs unavailable
due to disk failure is a pita anyway
(2) a dba will tell you to use different disks, but i never found
anyone willing to use 4 73Gb disks for redo logs

[[ I just had this really sick idea of creating a raid level that did
data duplication (aka mirroring) for the first N stripes, and
I had another sick idea of teaching lilo how to do raid5, but it won't
fit in 512b. anyway for the normal MD on partitions case creating one
n-way raid1 for /boot and raid5 for the rest

I really think the raid1/raid5 parts of MD are distinctly different
from DM, and they should remain separate.  However I am quite happy to
improve the interfaces so that seamless connections can be presented
by user-space tools.
reading this it looks like that the only way dm could get raid is
reimplementing or duplicating code from existing md, thus duplicating
code in the kernel.

To summarise:  If you want tigher integration between MD and DM, do it
by defining useful interfaces, not by trying to tie them both together
into one big lump.
we can think of md split in those major areas
1 the superblock interface, which i believe we all agree should go to
 user mode for all the array setup function, and should keep the
 portion for updating superblock in kernel space.
2 the raid logic code
3 the interface to lower block device layer
4 the interface to upper block device layer
(in md these 3 are thightly coupled)

some of these areas overlap with dm and it could be possible to merge
the duplicated functionality.

having said that and having looked 'briefly' at the code i believe that
doing something like this would mean reworking completely the logic
behind md, and adding some major parts to dm, or better to a separate
module.

in my idea we will have
a core that handles request mapping
metadata plugins for both md superblock format and lvm metadata (those
       would deal with keeping the metadata current with the array
       current status)
layout plugins for raid?, striping, linear, multipath (does this belong
       here or at a different level?)

L.

--
Luca Berra -- bluca@comedia.it
       Communication Media & Services S.r.l.
/"\
\ /     ASCII RIBBON CAMPAIGN
 X        AGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux