Re: RAID-6 aborted reshape

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 11, 2019 at 9:16 AM Andreas Klauer
<Andreas.Klauer@xxxxxxxxxxxxxx> wrote:
>
> On Sat, Jun 08, 2019 at 10:47:30AM -0500, Colt Boyd wrote:
> > I’ve also since tried to reassemble it with the following create but
> > the XFS fs is not accessible:
> > 'mdadm --create /dev/md0 --data-offset=1024 --level=6 --raid-devices=5
> > --chunk=1024K --name=OMV:0 /dev/sdb1 /dev/sde1 /dev/sdc1 /dev/sdd1
> > /dev/sdf1 --assume-clean --readonly'
>
> Well, all sorts of things can go wrong with a re-create.
> You should be using overlays for such experiments.
>
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

Thanks, I am now using overlays, but was not during the first hour
after disaster. I still have the superblock intact on the 6th device
(raid device 5) but the superblocks were overwritten on raid devices
0-4 when I tried the create.

> Also, which kernel version are you running?

4.19.0-0.bpo.2-amd64

> I think there was a RAID6 bug recently in kernel 5.1.3 or so.
>
> https://www.spinics.net/lists/raid/msg62645.html
>
> > /dev/sdg1:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x1
> >      Array UUID : f8fdf8d4:d173da32:eaa97186:eaf88ded
> >            Name : OMV:0
> >   Creation Time : Mon Feb 24 18:19:36 2014
> >      Raid Level : raid6
> >    Raid Devices : 6
> >
> >  Avail Dev Size : 5858529280 (2793.56 GiB 2999.57 GB)
> >      Array Size : 11717054464 (11174.25 GiB 11998.26 GB)
> >   Used Dev Size : 5858527232 (2793.56 GiB 2999.57 GB)
> >     Data Offset : 2048 sectors
> >    Super Offset : 8 sectors
> >    Unused Space : before=1960 sectors, after=2048 sectors
> >           State : clean
> >     Device UUID : 92e022c9:ee6fbc26:74da4bcc:5d0e0409
> >
> > Internal Bitmap : 8 sectors from superblock
> >     Update Time : Thu Jun  6 10:24:34 2019
> >   Bad Block Log : 512 entries available at offset 72 sectors
> >        Checksum : 8f0d9eb9 - correct
> >          Events : 1010399
> >
> >          Layout : left-symmetric
> >      Chunk Size : 1024K
> >
> >    Device Role : Active device 5
> >    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
>
> This already believes to have 6 drives, not in mid-reshape.
> What you created has 5 drives... that's a bit odd.
>
> It could still be normal, metadata for drives that get kicked out
> is no longer updated after all, and I haven't tested it myself...
>
> --examine of the other drives (before re-create) would be interesting.
> If those are not available, maybe syslogs of the original assembly,
> reshape and subsequent recreate.

The other drives got their superblocks overwritten (with the exception
of raid device 5 that was being added). Here are the applicable
sections from the syslogs.

Jun  6 10:12:25 OMV1 kernel: [    2.141772] md/raid:md0: device sde1
operational as raid disk 1
Jun  6 10:12:25 OMV1 kernel: [    2.141774] md/raid:md0: device sdc1
operational as raid disk 2
Jun  6 10:12:25 OMV1 kernel: [    2.141789] md/raid:md0: device sdd1
operational as raid disk 3
Jun  6 10:12:25 OMV1 kernel: [    2.141792] md/raid:md0: device sdf1
operational as raid disk 4
Jun  6 10:12:25 OMV1 kernel: [    2.141796] md/raid:md0: device sdb1
operational as raid disk 0
Jun  6 10:12:25 OMV1 kernel: [    2.142877] md/raid:md0: raid level 6
active with 5 out of 5 devices, algorithm 2
Jun  6 10:12:25 OMV1 kernel: [    2.196783] md0: detected capacity
change from 0 to 8998697828352
Jun  6 10:12:25 OMV1 kernel: [    3.885628] XFS (md0): Mounting V4 Filesystem
Jun  6 10:12:25 OMV1 kernel: [    4.213947] XFS (md0): Ending clean mount
Jun  6 10:12:25 OMV1 kernel: [    4.220989] XFS (md0): Quotacheck
needed: Please wait.
Jun  6 10:12:25 OMV1 kernel: [    7.200429] XFS (md0): Quotacheck: Done.
<snip>
Jun  6 10:17:40 OMV1 kernel: [  321.232145] md: reshape of RAID array md0
Jun  6 10:17:40 OMV1 systemd[1]: Created slice
system-mdadm\x2dgrow\x2dcontinue.slice.
Jun  6 10:17:40 OMV1 systemd[1]: Started Manage MD Reshape on /dev/md0.
Jun  6 10:17:40 OMV1 systemd[1]: mdadm-grow-continue@md0.service: Main
process exited, code=exited, status=2/INVALIDARGUMENT
Jun  6 10:17:40 OMV1 systemd[1]: mdadm-grow-continue@md0.service: Unit
entered failed state.
Jun  6 10:17:40 OMV1 systemd[1]: mdadm-grow-continue@md0.service:
Failed with result 'exit-code'.
Jun  6 10:18:02 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:18:32 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:19:02 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:19:32 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:20:02 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:20:32 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:21:02 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:21:32 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:22:02 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:22:32 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:23:02 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:23:32 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:24:02 OMV1 monit[1170]:
'filesystem_media_4e2b0464-e81b-49d9-a520-b574799452f8' space usage
88.1% matches resource limit [space usage>85.0%]
Jun  6 10:24:28 OMV1 systemd[1]: Unmounting /export/Shared...
Jun  6 10:24:28 OMV1 systemd[1]: Removed slice
system-mdadm\x2dgrow\x2dcontinue.slice.
<snip> - server shutting down
Jun  6 10:24:28 OMV1 systemd[1]: openmediavault-engined.service:
Killing process 1214 (omv-engined) with signal SIGKILL.


> Otherwise you have to look at the raw data (or try blindly)
> to figure out the data layout.
>
> Please use overlays for experiments...
>
> Good luck.

I am now, if only I was to start with this may be easier. Anyway to
rebuild superblocks from the remaing drive and / or the backup file?
And if so, would that be better?

Thanks,
Colt

> Regards
> Andreas Klauer




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux