Re: [PATCH 00/18] Assorted md patches headed for 2.6.30

Bill Davidsen <davidsen@xxxxxxx> · Fri, 13 Feb 2009 12:02:21 -0500

Farkas Levente wrote:
NeilBrown wrote:

On Thu, February 12, 2009 8:42 pm, Farkas Levente wrote:

NeilBrown wrote:

Hi,
 following is my current patch queue for 2.6.30, in case anyone would
like to review or otherwise comment.
They should show up in -next shortly.

Probably the most interesting are the last few which provide support
for converting a raid1 into a raid5, and a raid5 into a raid6.
I plan to do some more work here so the code might change a bit before
final submission, as I work out how best ot factor the code.

mdadm doesn't current support these conversions, but you can
simply
   echo raid5 > /sys/block/md0/md/level
to change a 2-drive raid1 into a raid5.  Similarly for 5->6

any plan for non-raid to raid1 or anything else like in windows on can
convert a normal partition into a mirrored one online.

No plan exactly, but I do think about it from time to time.

There are two problems with this, and solving just one of them
doesn't help you much.  So you really have to solve both at once,
which reduces the motivation towards either ....

One problem is the task of changing the implementation of the device
underneath the filesystem without the filesystem needing to care.

i.e. the filesystem opens block device 8,1 (/dev/sda1) and starts do
IO, then mdadm steps in and magically changes things so that /dev/sda1
is now on a raid1 array which happens to access the same data, but
through a different code path.
Figuring out exactly which data structure to install the redirection
and how to doing in a way that is guaranteed to be safe is non-trivial.

dm has a mechanism to change the implementation under a given dm
device, and md now has an mechanism to change the implementation
under a given md device.  But generalising that to 'any device' is
not entirely trivial.  Now that I have done it for md I'm in a better
position to understand how it might be done.

The other problem is where to store the metadata.  You need at least a
few bytes and realistically 1K of space on the devices that is free to
be used by md to record information about device state to allow arrays to
be assembled correctly.

One idea I had was to get the filesystem to allocate a block and make that
available to md, then md would copy the data from the last block of the
device into that block and redirect all IO request aim at the
last block so that really access the relocated block.  Then md puts
it's metadata in that last block.

This could work but is a little to error prone for my liking.  e.g.
if you fsck the device, you suddenly loose your guarantee that
the filesystem isn't going to write to that relocation block.

I think it could only work if mdadm can inspect the device and ensure
that the last block isn't part of any partition, or any active filesystem.
This is possible, but messy.

e.g. on my notebook which has a 250Gig drive whatever I used to partition
it (cfdisk?) insisted on using multiples of cylinders for partitions
(what an out-of-date concept!) and as the reported geometry is

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders

There are 5013 unused sectors at the end - plenty of room for
md to put some metadata.  But if someone else had used sfdisk,
I think they would find no spare space and be unhappy.

Maybe it is sufficient to support just those people who are
lucky enough to not be using the whole device...

So it might happen, but it is just a little to easy to stick this
one in the too-hard basket.

the main reason here is our life. i saw many cases where there was a
system installed to one system and later it'd be nice to make it
redundant (a most sysadm said: it's not working on linux it's even
working on windows, just put into a new disk and make it mirror).
so i don't know the technical detail, but would be a very useful feature.

I think you can get there for normal file systems data by creating a 
raid1 on a new drive using a failed drive. Then copy the data from the 
unmirrored drive to the mirrored f/s, unmount the original drive and 
mount the array, and add the original drive to the new array. This is 
ugly, and a verified backup and restore is better, but it can be done.

--
Bill Davidsen <davidsen@xxxxxxx>
 "Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over..." Otto von Bismark 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html