Re: More ddf container woes

Albert Pauw <albert.pauw@xxxxxxxxx> · Tue, 15 Mar 2011 20:07:17 +0100

 Hi Neil,

I updated to the git version (devel) and tried my "old" tricks:

- Create a container with 5 disks
- Created two raid sets (raid 1 md0 and raid 5 md1) in this container

mdadm -E  /dev/md127 shows all disks active/Online

- Failed one disk in md0

mdadm -E /dev/md127 shows this disk as active/Offline, Failed

- Failed one disk in md1

mdadm -E /dev/md127 shows this disk as active/Offline, Failed

- Added a new spare disk to the container

mdadm -E /dev/md127 shows this new disk as active/Online, Rebuilding

this looks good, but although the container has six disks, the lastly failed
disk is missing, mdadm -E /dev/md127 only shows five disks (including 
the rebuilding one).

This time however, only one of the failed raid sets is rebuilding, so 
that fix is ok.

Here is another scenario with strange implications:

- Created a container with 6 disks

mdadm -E /dev/md127 shows all 6 disks as Global-Spare/Online

- Removed one of the disks, as I only needed 5

This time mdadm -e /dev/md127 shows six physical disks, one of which has 
no device

- Created two raid sets (raid 1 md0 and raid 5 md1) in this container

mdadm -E  /dev/md127 shows all disks active/Online, except the "empty 
entry" which stays
Global-Spare/Online

- I fail two disks, one in each raid array

mdadm -E /dev/md127 shows these two disks as active/Offline, Failed

- I add back the disk I removed earlier, it should fit into the empty 
slot of mdadm -E

mdadm -E /dev/md127 shows something very strange, namely
-> All disks are not set to Global-Spare/Online
-> All device files are removed from the slots in mdadm -E, except the 
newly added one,
which shows the correct device

Albert

On 03/15/11 05:43 AM, NeilBrown wrote:
On Mon, 14 Mar 2011 10:00:17 +0100 Albert Pauw<albert.pauw@xxxxxxxxx>  wrote:

   Hi Neil,

thanks, yes I noticed with the new git stuff some problems are fixed now.

I noticed one more thing:

When I look at the end of the output of the "mdadm -E /dev/md127" output I
see it mentions the amount of phyiscal disks. When I fail a disk it is
marked as
"active/Offline, Failed" which is good. When I remove it, the amount of
physical
disks reported by the "mdadm -E" command stays the same, the RefNo is still
there, the Size is still there, the Device file is removed and the state
is still
"active/Offline, Failed". The whole entry should be removed and the
amount ofen
physical disks lowered by one.
Well... maybe.  Probably.

The DDF spec "requires" that there be an entry in the "physical disks"
table for every disk that is connected to the controller - whether failed
or not.
That makes some sense when you think about a hardware-RAID controller.
But how does that apply when DDF is running on a host system rather than
a RAID controller??
Maybe we should only remove them when they are physically unplugged??

There would probably be value in thinking through all of this a lot more
but for now I have arranged to remove any failed device that it not
part of an array (even a failed part).

You can find all of this in my git tree.  I decided to back-port the
code from devel-3.2 which deletes devices from the DDF when you remove
them from a container - so you should find the code in the 'master'
branch works as well as that in 'devel-3.2'.

I would appreciate any more testing results that you come up with.

When I re-add the failed disk (but NOT zeroed the superblock) the state
is still
"active/Offline, Failed" but reused for resynching a failed RAID set.

Assuming that the failed state of a disk is also recorded in the
superblock on the disk
three different possibilities are likely when adding a disk:

- A clean new disk, a new superblock is created with a new RefNo
- A failed disk is added, use the failed state and RefNo
- A good disk is added, possibly from a good RAID set, use this
superblock with the
RefNo and status. Make it possible to reassemble the RAID set when all
the disks
are added.
It currently seems to preserve the 'failed' state.  While that may
not be ideal, it is not clearly 'wrong' and can be worked-around
by zeroing the metadata.

So I plan to leave it as it is for the moment.

I hope to put a bit of time in to sorting out some of these more subtle
issues next week - so you could well see progress in the future ...
especially if you have a brilliant idea about how it *should* work and manage
to convince me :-)

Thanks for the fixes so far,
And thank you for your testing.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html