Re: How to assemble 4-disk raid5 with one broken disk and one marked as spare by operator error?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 8 Dec 2013 21:04:42 +0100 (CET) Tomas Agartz <tlund@xxxxxx> wrote:

> After booting a server that had been powered off for some time, the 4-disk 
> raid5 device was up and running in read-only mode with one disk missing. 
> After a, in hindsight, hasty decision, "mdadm --manage --add /dev/md0 
> /dev/sdd" was executed to re-add the missing device to the array.
> 
> At this time, all hell broke loose :) The first thing that happened was 
> that sdd was added as a spare instead of re-added as expected. The second 
> thing was that a different disk, sdb, was kicked from the array because of 
> read/sata-bus errors. The root disk also bailed and the system had to be 
> powercycled.

If you want to re-add, it is safest to ask mdadm to --re-add, not to --add.

> 
> The real problem, from the start, was probably that sdb was bad all along, 
> but from some reason sdd was the device missing from the array after the 
> initial boot.
> 
> Trying to read data from sdb gives read errors and timeouts, but I was 
> able to do "mdadm --examine" after resetting the sata port.
> 
> The current state is that, out of 4 disks two are good (sde and sdf), one 
> is (in error) marked as a spare (sdd), and the fourth device is unusable 
> (sdb).
> 
> What is the correct method do change the spare disk back to a data disk 
> and try to restart the array with 3 out of 4 devices (sdd, sde and sdf)?
> 

The only real option at this point is to --create the array.  There isn't
enough information for mdadm to be able  to do anything clever.

> The device has never had a spare, so I think that sdd used to be "Active 
> device 0" before this happened?
> 
> Possibly relevant data from mdadm --examine on the four devices:
> 
> sdb          State : clean
> sdb         Events : 333560
> sdb   Device Role : Active device 3
> sdb   Array State : .AAA ('A' == active, '.' == missing)
> 
> sdd          State : clean
> sdd         Events : 333562
> sdd   Device Role : spare
> sdd   Array State : .AA. ('A' == active, '.' == missing)
> 
> sde          State : clean
> sde         Events : 333562
> sde   Device Role : Active device 1
> sde   Array State : .AA. ('A' == active, '.' == missing)
> 
> sdf          State : clean
> sdf         Events : 333562
> sdf   Device Role : Active device 2
> sdf   Array State : .AA. ('A' == active, '.' == missing)
> 
> If no one else has any better suggestions, my best guess would be to: 
> "mdadm --create /dev/md0 --level=5 --raid-devices=4 --assume-clean 
> /dev/sdd /dev/sde /dev/sdf missing" (the device was created with default 
> values, metadata 1.2, chunk size 512K, layout left-symmetric).

Check the "Data Offset" of the devices and make sure the newly created array
gets the same "Data Offset" (it can explicitly be set with the latest mdadm).

NeilBrown


> 
> (Other crazy ideas involve editing the superblock of sdd and making it 
> device 0 and then trying to start the array after that).
> 
> Best regards,
> Tomas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux