Re: RAID 6, 6 device array - all devices lost superblock

Phil Turmel <philip@xxxxxxxxxx> · Sun, 28 Aug 2022 12:47:03 -0400

Hi Peter, et al,

On 8/28/22 05:54, Wols Lists wrote:
On 28/08/2022 10:14, Wols Lists wrote:
Currently I have no /dev/md* devices.
I have access to the old mdadm.conf file - have tried assembling with
it, with the default mdadm.conf, and with no mdadm.conf file in /etc
and /etc/mdadm.

It looks like the drives weren't partitioned :-( I think you're into 
forensics.

It is too soon to say this.  The supplied mdadm.conf file does not
contain specific partition information.  It is possible the partition
tables have just been wiped.

Whoops - my system froze while I was originally writing my reply, and I 
forgot to put this into my rewrite ...

Look up overlays in the wiki. I've never done it myself, but a fair few 
people have said the instructions worked a treat.

You're basically making the drives read-only (all writes get dumped into 
the overlay file), and then re-creating the array over the top, so you 
can test whether you got it right. If you don't, you just ditch the 
overlays and start again, if you did get it right you can recreate the 
array for real.

Cheers,
Wol

On 8/28/22 11:10, John Stoffel wrote:
"Peter" == Peter Sanders <plsander@xxxxxxxxx> writes:

Peter> have a RAID 6 array, 6 devices.  Been running it for years without much issue.
Peter> Had hardware issues with my system - ended up replacing the
Peter> motherboard, video card, and power supply and re-installing the OS
Peter> (Debian 11).

Can you give us details on the old vs new motherboard/cpu?  It might
be that you need to tweak the BIOS of the motherboard to expose the
old SATA formats as well.  

Did you install debian onto a fresh boot disk?  Is your BIOS setup to
only do the new form of booting from UEFI devices, so maybe check your
BIOS settings that the data drives are all in AHCI mode, or possibly
even in IDE mode.  It all depends on how old the original hardware
was.  

I just recenly upgraded from a 2010 MB/CPU combo and I had to tweak
the BIOS defaults to see my disks.  I guess I should do a clean
install from a blank disk, but I wanted to minimize downtime.  

It is important to end up in AHCI mode on all MOBO ports.  If not set 
that way now, please change them.

Wols has some great advice here, and I heartily recommend that you use
overlayfs when doing your testing.  Check the RAID WIKI for
suggestions.

Concur.

And don't panic!  Your data is probably there, but just missing the
super blocks or partition tables. 

Both, I suspect.

On 8/27/22 22:00, Peter Sanders wrote:
lsdrv ------------------------
PCI [nvme] 01:00.0 Non-Volatile memory controller: Phison Electronics
Corporation E12 NVMe Controller (rev 01)
└nvme nvme0 PCIe SSD                                 {21112925606047}
 └nvme0n1 238.47g [259:0] Partitioned (dos)
  ├nvme0n1p1 485.00m [259:1] ext4 {f38776ac-1ce9-4fc8-ba50-94844b9f504e}
  │└Mounted as /dev/nvme0n1p1 @ /boot
  ├nvme0n1p2 1.00k [259:2] Partitioned (dos)
  ├nvme0n1p5 60.54g [259:3] ext4 {5ee1c3c0-3a05-466c-9f98-f5807c8d813b}
  │└Mounted as /dev/nvme0n1p5 @ /
  ├nvme0n1p6 93.13g [259:4] ext4 {9064169f-4fe3-4836-a906-28c1b445cdff}
  │└Mounted as /dev/nvme0n1p6 @ /var
  ├nvme0n1p7 37.00m [259:5] ext4 {25e161ad-94a0-4298-afaf-18e2433766ee}
  ├nvme0n1p8 82.89g [259:6] ext4 {ac874071-d759-4d33-b32f-83272f3eacd9}
  │└Mounted as /dev/nvme0n1p8 @ /home
  └nvme0n1p9 1.41g [259:7] swap {02cef84b-9a9d-4a0a-973c-fda1a78c533c}
PCI [pata_jmicron] 26:00.1 IDE interface: JMicron Technology Corp.
JMB368 IDE controller (rev 10)
└scsi 0:0:0:0 MAD DOG  LS-DVDRW TSH652M {MAD_DOG_LS-DVDRW_TSH652M}
 └sr0 1.00g [11:0] Empty/Unknown
PCI [ahci] 26:00.0 SATA controller: JMicron Technology Corp. JMB363
SATA/IDE Controller (rev 10)
└scsi 2:x:x:x [Empty]
PCI [ahci] 2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
FCH SATA Controller [AHCI mode] (rev 51)
├scsi 6:0:0:0 ATA      TOSHIBA HDWD130  {477ALBNAS}
│└sda 2.73t [8:0] Partitioned (PMBR)
└scsi 7:0:0:0 ATA      TOSHIBA HDWD130  {Y7211KPAS}
 └sdc 2.73t [8:32] Partitioned (gpt)
PCI [ahci] 2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
FCH SATA Controller [AHCI mode] (rev 51)
├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC1T0668790}
│└sdb 2.73t [8:16] Partitioned (gpt)
├scsi 9:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N0091255}
│└sdd 2.73t [8:48] Partitioned (gpt)
├scsi 12:0:0:0 ATA      WDC WD30EZRX-00M {WD-WCAWZ2669166}
│└sde 2.73t [8:64] Partitioned (gpt)
└scsi 13:0:0:0 ATA      TOSHIBA HDWD130  {477ABEJAS}
 └sdf 2.73t [8:80] Partitioned (gpt)

Unfortunately, my lsdrv tool is not able to reconstruct missing parts. 
It is most useful when used on a *good* system and *saved* for help 
diagnosing *future* problems.

Please share your /etc/fstab, and if you were using LVM on top of the 
raid, share your lvm.conf and anything in /etc/lvm/backup.

Please describe the layer(s) that were on top of the raid.

We need to help you look for signatures, and it helps to be selective in 
what signatures to look for.

After that, we will want to figure out your raid's chunk size and data 
offsets.  If you know of a particular large file (8MB or larger) that is 
sure to be in the raid and you happen to have a copy tucked away, then 
my findHash[1] tool might be able to definitively determine those 
values.  (Time consuming, though.)

Meanwhile, don't do *anything* that would write to those drives.

Phil

[1] https://github.com/pturmel/findHash