On Mon, 26 Nov 2012 15:48:42 -0800 Ross Boylan <ross@xxxxxxxxxxxxxxxx> wrote: > I may have an explanation for what happened, including why md0 and md1 > were treated differently. > On Fri, 2012-11-23 at 16:15 -0800, Ross Boylan wrote: > > On Thu, 2012-11-22 at 15:52 +1100, NeilBrown wrote: > > > On Wed, 21 Nov 2012 08:58:57 -0800 Ross Boylan <ross@xxxxxxxxxxxxxxxx> wrote: > > > > > > > I spent most of yesterday dealing with the failure of my (md) RAID > > > > arrays to come up on reboot. If anyone can explain what happened or > > > > what I can do to avoid it, I'd appreciate it. Also, I'd like to know if > > > > the failure of one device in a RAID 1 can contaminate the other with bad > > > > data (I think the answer must be yes, in general, but I can hope). > > > > > > > > In particular, I'll need to reinsert the disks I removed (described > > > > below) without getting everything screwed up. > > > > > > > > Linux 2.6.32 amd64 kernel. > > > > > > > > I'll describe what I did for md1 first: > > > > > > > > 1. At the start, system has 3 physically identical disks. sda and sdc > > > > are twins and sdb is unused, though partitioned. md1 is a raid1 of sda3 > > > > and sdc3. Disks have DOS partitions. > > > > 2. Add 2 larger drives to the system. They become sdd and sde. These 2 > > > > are physically identical to each other, and bigger than the first batch > > > > of drives. > > > > 3. GPT format the drives with larger partitions than sda. > > > > 4. mdadm --fail /dev/md1 /dev/sdc3 > > > > 5. mdadm --add /dev/md1 /dev/sdd4. Wait for sync. > > > > 6. madadm --add /dev/md1 /dev/sde4. > > > > 7. mdadm --grow /dev/md1 -n 3. Wait for sync. > > > > > > > > md0 was same story except I only added sdd (and I used partitions sda1 > > > > and sdd2). > > > > > > > > This all seemed to be working fine. > > > > > > > > Reboot. > > > > > > > > System came up with md0 as sda1 and sdd2, as expected. > > > > But md1 was the failed sdc3 only. Note I did not remove the partition > > > > from md1; maybe I needed to? > First, the Debian initrd I'm using does recognize GPT partitions, and so > unrecognized partitions did not cause the problem. > > Second, the initrd executes mdadm --assemble --scan --run --auto=yes. > This uses conf/conf.d/md and etc/mdadm/mdadm.conf. The latter includes > --num-devices for each array. Yes, having an out-of-date "devices=" in mdadm.conf would cause the problems you are having. You don't really want that at all. > Since I did not regenerate this after > changing the array sizes, it was 2 for both arrays. man mdadm.conf says > ARRAY The ARRAY lines identify actual arrays. The second word on the > line should be the name of the device where the array is nor- > mally assembled, such as /dev/md1. Subsequent words identify > the array, or identify the array as a member of a group. If > multiple identities are given, then a component device must > match ALL identities to be considered a match. [ num-devices is > one of the identity keywords]. > > This was fine for md0 (unless it should have been 3 because of the > failed device), It should be the number of "raid devices" i.e. the number of active devices when the array is optimal. It ignores spares. > and at least consistent with the metadata on sdc3, > formerly part of md1. It was inconsistent with the metadata for md1 on > its current components, sda3, sdd4, and sde4, all of which indicates a > size of 3 (or 4 if failed devices count). > > I do not know if the "must match" logic applies to --num-devices (since > the manual says the option is mainly for compatibility with the output > of --examine --scan), nor do I know if the --run option overrides the > matching requirement. But md0's components might match the num-devices > in mdadm.conf, while md1's current components do not match. md1's old > commponent does match. Yes, "must match" means "must match". And this is exactly what md1's old component was made into an array while the new components were ignored. > > I don't know if, before all that, udev triggers attempts to assemble > arrays incrementally. Nor do I know how such incremental assembly works > when some of the candidate devices are out of date. "mdadm -I" (run from udev) pays more attention to the uuid than "mdadm -A" does - it can only assemble one array with a given uuid. (mdadm -A will sometimes assemble 2. That is the bug I mentioned in a previous email which will be fixed in mdadm-3.3). So it would see several devices with the same uuid, but some are inconsistent with mdadm.conf so would be rejected (I think). > > So the mismatch between the array size for md0, but not md1, might > explain why md0 came up as expected, but md1 came up as a single, old > partition instead of the 3 current ones. s/might/does/ > > However, it is awkward for this account that after I set the array sizes > to 1 for both md0 and md1 (using partitions from sda)--which would be > inconsistent with the size in mdadm.conf--they both came up. There were > fewer choices at that point, since I had removed all the other disks. I guess that as "all" the devices with a given UUID were consistent, mdadm -I accepted them even as "not present in mdadm.conf". > > Third, my recent experience suggests something more is going on, and > perhaps the count considerations just mentioned are not that important. > I'll put what happened at the end, since it happened after everything > else described here. > > > > > > > > Shutdown, removed disk sdc for the computer. Reboot. > > > > /md0 is reassembled to but md1 is not, and so the system can not not > > > > come up (since root is on md0). BTW, md1 is used as a PV for LVM; md0 > > > > is /boot. > > > > > > > > In at least some kernels the GPT partitions were not recognized in the > > > > initrd of the boot process (Knoppix 6--same version of the kernel, > > > > 2.6.32, as my system, though I'm not sure the kernel modules are same as > > > > for Debian). I'm not sure if the GPT partitions were recognized under > > > > Debian in the initrd, though they obviously were in the running system > > > > at the start. > > > > > > Well if your initrd doesn't recognise GPT, then that would explain your > > > problems. > > I later found, using the Debian initrd, that arrays with fewer than the > > expected number of devices (as in the n= paramter) do not get activated. > > I think that's what you mean by "explain your problems." Or did you have > > something else in mind? > > > > At least I think I found arrays with missing parts are not activated; > > perhaps there was something else about my operations from knoppix 7 > > (described 2 paragraps below this) that helped. > > > > The other problem with that discovery is that the first reboot activated > > md1 with only 1 partition, even though md1 had never been configured > > with <2. > > > > Most of my theories have the character of being consistent with some > > behavior I saw and inconsistent with other observed behavior. Possibly > > I misperceived or misremembered something. > > > > > > > > > > > After much trashing, I pulled all drives but sda and sdb. This was > > > > still not sufficient to boot because the md's wouldn't come up. md0 was > > > > reported as assembled, but was not readable. I'm pretty sure that was > > > > because it wasn't activated (--run) since md was waiting for the > > > > expected number of disks (2). md1, as before, wasn't assembled at all. > > > > > > > > >From knoppix (v7, 32 bit) I activated both md's and shrunk them to size > > > > 1 (--grow --force -n 1). In retrospect this probably could have been > > > > done from the initrd. > > > > > > > > Then I was able to boot. > > > > > > > > I repartitioned sdb and added it to the RAID arrays. This led to hard > > > > disk failures on sdb, though the arrays eventually were assembled. I > > > > failed and removed the sdb partitions from the arrays and shrunk them. > > > > I hope the bad sdb has not screwed up the good sda. > > > > > > Its not entirely impossible (I've seen it happen) but it is very unlikely > > > that hardware errors on one device will "infect" the other. > > Our local sysadmin also believes the errors in sdb were either > > corrected, or resulted in an error code, rather than ever sending bad > > data back. I'm proceeding on the assumption sda is OK. > > > > > > > > > > > Thanks for any assistance you can offer. > > > > > > What sort of assistance are you after? > > I'm trying to understand what happened and how to avoid having it happen > > again. > > > > I'm also trying to understand under what conditions it is safe to insert > > disks that have out of date versions of arrays in them. > > > > > > > > first questions is: does the initrd handle GPT. If not, fix that first. > > That is the first thing I'll check when I'm at the machine. The problem > > with the "initrd didn't recognize GPT theory" was that in my very first > > reboot md0 was assemebled from two partitions, one of which was on a GPT > > disk. (another example of "all my theories have contradictory evidence") > > > > Ross > After running for awhile with both RAIDs having size 1 and using sda > exclusively, I shut down the sytem, removed the physically failing sdb, > and added the 2 GPT disks, formerly known as sdd and sde. sdd has > partitions that were part of md0 and md1; sde has a partition that was > part of md1. For simplicity I'll continue to refer to them as sdd and > sde, even though they were called sdb and sdc in the new configuration. > > This time, md0 came up with sdd2 (which is old) only and md1 came up > correctly with sda3 only. Substantively sdd2 and sda1 are identical, > since they hold /boot and there have been no recent changes to it. > > This happened across 2 consecutive boots. Once again, the older device > (sdd2) was activated in preference to the newer one (sda1). > > In terms of counts for md0, mdadm.conf continued to indicate 2; sda1 > indicates 1 device; and sdd2 indicates 2 devices + 1 failed device. That is why mdadm preferred sdd2 to sda1 - it matched mdadm.conf better. I strongly suggest that you remove all "devices=" entries from mdadm.conf. NeilBrown > > BTW, by using break=bottom as a kernel parameter one can interrupt the > initrd just after mdadm has run and see if the mappings are right. For > the 2nd boot I did just that, and then manually shutdown md0 and brought > it back with sda1. The code appears to offer break=post-mdadm as an > alternative, but that did not work for me (there was no break). These > are Debian-specific tweaks, I believe. > > Ross
Attachment:
signature.asc
Description: PGP signature