Re: Raid auto-assembly upon boot - device order

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/28/2011 08:01 AM, Pavel Hofman wrote:
> Hi Phil,

[...]

> Thanks a lot for your quick reply. And for your wonderful tool too.

You're welcome.

> orfeus:/boot# lsdrv
> PCI [AMD_IDE] 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
>  └─ide 2.0 HL-DT-ST RW/DVD GCC-H20N {[No Information Found]}
>     └─hde: [33:0] Empty/Unknown 4.00g
> PCI [sata_nv] 00:05.0 IDE interface: nVidia Corporation MCP55 SATA
> Controller (rev a3)
>  ├─scsi 0:0:0:0 ATA SAMSUNG HD753LJ {S13UJDWQ912345}
>  │  └─sda: [8:0] MD raid10 (4) 698.64g inactive
> {646f62e3:626d2cb3:05afacbb:371c5cc4}
>  │     └─sda1: [8:1] MD raid0 (0/2) 698.64g md3 clean in_sync
> {8c9c28dd:ac12a9ef:a6200310:fe6d9686}
>  │        └─md3: [9:3] MD raid1 (0/2) 2.03t md5 active in_sync
> 'orfeus:5' {2f88c280:3d7af418:e8d459c5:782e3ed2}
>  │           └─md5: [9:5] MD raid1 (1/2) 2.03t md7 active in_sync
> 'orfeus:7' {dde16cd5:2e17c743:fcc7926c:fcf5081e}
>  │              └─md7: [9:7] (xfs) 2.03t 'backup'
> {d987301b-dfb1-4c99-8f72-f4b400ba46c9}
>  │                 └─Mounted as /dev/md7 @ /mnt/raid
>  └─scsi 1:0:0:0 ATA ST3750330AS {9QK0VFJ9}
>     └─sdb: [8:16] Empty/Unknown 698.64g
>        └─sdb1: [8:17] MD raid0 (0/2) 698.64g md4 clean in_sync
> {ce213d01:e50809ed:a6200310:fe6d9686}
>           └─md4: [9:4] MD raid1 (0/2) 2.03t md6 active in_sync
> ''orfeus':6' {1f83ea99:a9e4d498:a6543047:af0a3b38}
>              └─md6: [9:6] MD raid1 (0/2) 2.03t md7 active spare
> ''orfeus':7' {dde16cd5:2e17c743:fcc7926c:fcf5081e}
> PCI [sata_nv] 00:05.1 IDE interface: nVidia Corporation MCP55 SATA
> Controller (rev a3)
>  ├─scsi 2:0:0:0 ATA ST31500341AS {9VS15Y1L}
>  │  └─sdc: [8:32] Empty/Unknown 1.36t
>  │     ├─sdc1: [8:33] MD raid1 (0/5) 10.24g md1 clean in_sync
> {588cbbfd:4835b4da:0d7a0b1c:7bf552bb}
>  │     │  └─md1: [9:1] (ext3) 10.24g {f620df1e-6dd6-43ab-b4e6-8e1fd4a447f7}
>  │     │     └─Mounted as /dev/md1 @ /
>  │     ├─sdc2: [8:34] MD raid1 (0/2) 8.38g md2 clean in_sync
> {28714b52:55b123f5:a6200310:fe6d9686}
>  │     │  └─md2: [9:2] (swap) 8.38g {1804bbc6-a61b-44ea-9cc9-ac3ce6f17305}
>  │     └─sdc3: [8:35] MD raid0 (1/2) 1.35t md3 clean in_sync
> {8c9c28dd:ac12a9ef:a6200310:fe6d9686}
>  └─scsi 3:0:0:0 ATA ST31500341AS {9VS13H4N}
>     └─sdd: [8:48] Empty/Unknown 1.36t
>        ├─sdd1: [8:49] MD raid1 (3/5) 10.24g md1 clean in_sync
> {588cbbfd:4835b4da:0d7a0b1c:7bf552bb}
>        ├─sdd2: [8:50] MD raid1 (1/2) 8.38g md2 clean in_sync
> {28714b52:55b123f5:a6200310:fe6d9686}
>        └─sdd3: [8:51] MD raid0 (1/2) 1.35t md4 clean in_sync
> {ce213d01:e50809ed:a6200310:fe6d9686}

Pretty deep layering.  I think I'm going to reduce the amount of indentation per layer.

> Still you got the setup at the first look fine without the visualisation :)
> 
>>
>>
>> I suspect it is merely timing.  You are using degraded arrays
>> deliberately as part of your backup scheme, which means you must be
>> using "start_dirty_degraded" as a kernel parameter.  That enables
>> md7, which you don't want degraded, to start degraded when md6 is a
>> hundred or so milliseconds late to the party.
> 
> Running rgrep on /etc and /boot reveals no such kernel parameter on this
> system. I have never had problems with the arrays not starting, perhaps
> it is hard-compiled in debian kernel (lenny)? Config for the current
> kernel in /boot does not list any such parameter either.
> 
> Woould using this parameter just change the timing?

No.  Degraded arrays are supposed to not assemble without it.  Maybe it only applies to kernel autoassembly, which I no longer use.

>> I think you have a couple options:
>>
>> 1) Don't run degraded arrays.  Use other backup tools.
> 
> It took me several years to find a reasonably fast way to offline-backup
> that partition with tens of millions of backuppc hardlinks :)

I've heard of hardlink horrors with backuppc.  I don't use it myself.  I prefer to use LVM on top of MD, then take compressed backups of LVM snapshots.

>> 2) Remove md7
>> from your mdadm.conf in your initramfs.  Don't let early userspace
>> assemble it.  The extra time should then allow your initscripts on
>> your real root fs to assemble it with both members.  This only works
>> if md7 does not contain your real root fs.
> 
> Fantastic, I will do so. Just have to find a way to keep different
> mdadm.conf in /etc and in initramfs while preserving the useful
> update-initramfs functionality :)

I haven't dug that deep.  I use dracut, myself.

>>> Plus how can can a background reconstruction be started on md6, if
>>> it is degraded and the other mirroring part is not even present?
>>
>> Don't know.  Maybe one of your existing drives is occupying a
>> major/minor combination that your esata drive occupied on your last
>> backup.  I'm pretty sure the message is harmless.  I noticed that md5
>> has a bitmap, but md6 does not.  I wonder if adding a bitmap to md6
>> would change the timing enough to help you.
> 
> Wow, there is bitmap missing on md6 indeed. I swear it was there, in the
> past :) It cuts down significantly the synchronization time for offline
> copies. I have two offline drive sets - each rotating every two weeks.
> One offline set plugs into md5, the other one into md6. This way I can
> have two bitmaps, one for each set. Apparently, not now :-)

Mirror w/ bitmap would make 1:1 backups faster.  I understand why you are doing this, but I'd be worried about filesystem integrity at the point in time you disconnect the backup drive.  Have you performed any tests to be sure you can recover usable data from the offline copy?  If I recall correctly, an LVM snapshot operation incorporates a filesystem metadata sync.

>> Relying on timing variations for successful boot doesn't sound great
>> to me.
> 
> You are right. Hopefully the significantly delayed assembly will work OK.
> 
> I very appreciate your help, thanks a lot,
> 
> Pavel.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux