Re: (help!) MD RAID6 won't --re-add devices? [SOLVED!]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the COW idea, had not thought of that. Luckily, I had 10 spare 2TB drives racked and powered, so I just backed up all the drives using dd.

Turns out a good way to test if you've got the right combination of drives is to do echo check > sync_action, wait 5 seconds, and then check mismatch_cnt. If you've found the right combination, the count will be low or zero.

Another important thing to note is that "Version" reported by mdadm --detail /dev/mdX is NOT always the same as version reported by mdadm --examine /dev/sdX. I guess array header and drive header track different version numbers. My array header was reporting 1.02 while all the drives were showing 1.2.

And a key thing to know is that the default Data Offset has CHANGED over the years. My original drives reported an offset of 272 sectors, and I believe the array was made with mdadm-2.6.6. Using mdadm-3.1.4 to create a new array put the offset at 2048 sectors, a huge change! Also, it seems when mdadm-3.1.4 added the old drives (272 offset at the time) into the array that was missing 5/10 drives and marked them as spares, the spare-marking process changed the offset to 384 sectors. The array when created with mdadm-3.1.4 had actually reduced the Used Dev Size a bit from what the original array had, so none of the permutations worked since everything was misaligned. I had to downgrade to mdadm-3.0 which created the array with the proper Dev Size and the proper Data Offset of 272 sectors for the RAID6 blocks to line up.

Is there documentation somewhere about all these default changes? I saw no options to specify the data offset either. That would be a good option to add.

But best to add would be functional --re-add capability! Reporting the array is "busy" when I'm trying to return its 5 missing drives isn't useful. It should re-add its old drives as expected and flush any pending buffers.

Below is the (very hacky) code I used to test all the permutations of the 5 drives whose sequence was lost by being marked as spares. Hopefully it doesn't have to help anyone in the future.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>

char *permutation[] = { "nopqr", "noprq", "noqpr", "noqrp", "norpq", "norqp", "npoqr", "nporq", "npqor", "npqro", "nproq", "nprqo", "nqopr", "nqorp", "nqpor", "nqpro", "nqrop", "nqrpo", "nropq", "nroqp", "nrpoq", "nrpqo", "nrqop", "nrqpo", "onpqr", "onprq", "onqpr", "onqrp", "onrpq", "onrqp", "opnqr", "opnrq", "opqnr", "opqrn", "oprnq", "oprqn", "oqnpr", "oqnrp", "oqpnr", "oqprn", "oqrnp", "oqrpn", "ornpq", "ornqp", "orpnq", "orpqn", "orqnp", "orqpn", "pnoqr", "pnorq", "pnqor", "pnqro", "pnroq", "pnrqo", "ponqr", "ponrq", "poqnr", "poqrn", "pornq", "porqn", "pqnor", "pqnro", "pqonr", "pqorn", "pqrno", "pqron", "prnoq", "prnqo", "pronq", "proqn", "prqno", "prqon", "qnopr", "qnorp", "qnpor", "qnpro", "qnrop", "qnrpo", "qonpr", "qonrp", "qopnr", "qoprn", "qornp", "qorpn", "qpnor", "qpnro", "qponr", "qporn", "qprno", "qpron", "qrnop", "qrnpo", "qronp", "qropn", "qrpno", "qrpon", "rnopq", "rnoqp", "rnpoq", "rnpqo", "rnqop", "rnqpo", "ronpq", "ronqp", "ropnq", "ropqn", "roqnp", "roqpn", "rpnoq", "rpnqo", "rponq", "rpoqn", "rpqno", "rpqon", "rqnop", "rqnpo", "rqonp", "rqopn", "rqpno", "rqpon" };

int main()
{
        int i, mismatches, status;
        FILE *handle;
        char command[1024];

        for (i = 0; i < sizeof permutation / sizeof (char *); i++) {
                mismatches = -1; // Safety
sprintf(command, "/sbin/mdadm --create /dev/md4 --assume-clean -R -e 1.2 -l 6 -n 10 -c 64 /dev/sda1 /dev/sd%c1 /dev/sdc1 /dev/sdd1 /dev/sd%c1 /dev/sdm1 /dev/sd%c1 /dev/sd%c1 /dev/sd%c1 /dev/sdb1", permutation[i][0], permutation[i][1], permutation[i][2], permutation[i][3], permutation[i][4]);
                printf("Running: %s\n", command);
                status = system(command);
                if (WEXITSTATUS(status) != 0) {
                        printf("Command error\n");
                        return;
                }
                sleep(1);
                handle = fopen("/sys/block/md4/md/sync_action", "w");
                fprintf(handle, "check\n");
                fclose(handle);
                sleep(5);
                handle = fopen("/sys/block/md4/md/mismatch_cnt", "r");
                fscanf(handle, "%d", &mismatches);
                fclose(handle);
printf("Permutation %s = %d mismatches\n", permutation[i], mismatches);
                fflush(stdout);
                sprintf(command, "/sbin/mdadm --stop /dev/md4");
                printf("Running: %s\n", command);
                status = system(command);
                if (WEXITSTATUS(status) != 0) {
                        printf("Command error\n");
                        return;
                }
                sleep(1);

        }
}

The permutations I got from an online permutation generator:

http://users.telenet.be/vdmoortel/dirk/Maths/permutations.html

Didn't feel like writing that part of the algorithm.

--Bart


On 1/15/2011 4:05 PM, Jérôme Poulin wrote:
On Sat, Jan 15, 2011 at 2:50 PM, Bart Kus<me@xxxxxxxx>  wrote:
Some research has revealed a frightening solution:

http://forums.gentoo.org/viewtopic-t-716757-start-0.html

That thread calls upon mdadm --create with the --assume-clean flag.  It also
seems to re-enforce my suspicions that MD has lost my device order numbers
when it marked the drives as spare (thanks, MD!  Remind me to get you a nice
christmas present next year.).  I know the order of 5 out of 10 devices, so
that leaves 120 permutations to try.  I've whipped up some software to
generate all the permuted mdadm --create commands.

The question now: how do I test if I've got the right combination?  Can I dd
a meg off the assembled array and check for errors somewhere?
I guess running a read-only fsck is the best way to proove it working.

The other question: Is testing incorrect combinations destructive to any
data on the drives?  Like, would RAID6 kick in and start "fixing" parity
errors, even if I'm just reading?

If you don't want to risk your data, you could create a cowloop of
each device before writing to it, or dm snapshot using dmsetup.

I made a script for dmsetup snapshot on the side when I really needed
it because cowloop wouldn't compile. Here it is, it should help you
understand how it works!


RODATA=$1
shift
COWFILE=$1
shift
FSIZE=$1
shift
PREFIX=$1
shift

if [ -z $RODATA ] || [ -z $COWFILE ] || [ -z $FSIZE ] || [ ! -z $5 ]
then
	echo "Usage: $0 [read only device] [loop file] [size of loop in MB] {prefix}"
	echo "Read only device won't ever get a write."
	echo "Loop file can be a file or device where writes will be directed too."
	echo "Size is specified in MB, you will be able to write that much
change to the device created."
	echo "Prefix will get prepended to all devices created by this script
in /dev/mapper"
	exit -1
fi

MRODATA=$PREFIX${RODATA#/dev/}data
COWFILELOOP=$(losetup -f)
MCOWFILE=$PREFIX${RODATA#/dev/}cow
MSNAPSHOT=$PREFIX${RODATA#/dev/}snap


dd if=/dev/zero of=$COWFILE bs=1M seek=$FSIZE count=1
losetup $COWFILELOOP $COWFILE
echo "0 $(blockdev --getsz $RODATA) linear $RODATA 0" | dmsetup create $MRODATA
echo "0 $(blockdev --getsz $COWFILELOOP) linear $COWFILELOOP 0" |
dmsetup create $MCOWFILE
echo "0 $(blockdev --getsz /dev/mapper/$MRODATA) snapshot
/dev/mapper/$MRODATA /dev/mapper/$MCOWFILE p 64" | dmsetup create
$MSNAPSHOT

echo "You can now use $MSNAPSHOT for your tests, up to ${FSIZE}MB."
exit 0


--Bart

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux