Re: Persistent failures with simple md setup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 03 Mar 2013 20:31:57 +0100 Hans-Peter Jansen <hpj@xxxxxxxxx> wrote:

> Am Donnerstag, 28. Februar 2013, 23:16:01 schrieb Hans-Peter Jansen:
> > Am Freitag, 1. März 2013, 08:25:20 schrieb NeilBrown:
> > > On Thu, 28 Feb 2013 11:49:53 +0100 Hans-Peter Jansen <hpj@xxxxxxxxx>
> > > wrote:
> > > 
> > > This is what I asked for:
> > > > > It really feels like a udev bug, but it is hard to be sure.
> > > > > Could you edit /etc/init.d/boot.md and add
> > > > > 
> > > > >    echo BEFORE mdadm -IRs > /dev/kmsg
> > > > > 
> > > > > just before the call the "$mdadm_BIN -IRs",
> > > > > 
> > > > >    echo AFTER mdadm -IRs > /dev/kmsg
> > > > > 
> > > > > just after that call, and
> > > > > 
> > > > >    echo AFTER mdadm -Asc > /dev/mksg
> > > > > 
> > > > > just after the call to "$mdadm_BIN -A -s -c $mdadm_CONFIG"
> > > > > 
> > > > > And maybe put similar messages just before and after the
> > > > > 
> > > > >     /sbin/udevadm settle ....
> > > > > 
> > > > > call.
> > > > > 
> > > > > If you can then reproduce the problem, the extra logging might tell us
> > > > > something useful.
> > > 
> > > This is not at all the same thing:
> > > > I've enabled initrd (linuxrc=trace) and boot.md (#!/bin/sh -x)
> > > > debugging,
> > > > hence we should see the full story next time.
> > > 
> > > It might help, but I'm far from certain that it will be nearly as useful.
> 
> I see, sh debug messages do not condense in /var/log/something, they're routed
> somewhere else. I not sure, how I like that systemd implied sillyness.
> 
> > Okay, okay, I do it both ways now..
> > 
> > Let's see, when it triggers.
> 
> I'm not sure, how the basic kernel boot contribute to boiling this issue down, 
> but here we go: (not censored this time, hope it will go though..)



Thanks.  The interesting bit is here:

> Mar  3 09:11:59 zaphkiel kernel: [   28.781703] md: bind<sdb4>
It seems that udev is running "mdadm -I" on sdb4 here

> Mar  3 09:11:59 zaphkiel kernel: [   28.799939] tveeprom 1-0050: audio processor is MSP3415 (idx 6)
> Mar  3 09:11:59 zaphkiel kernel: [   28.862699] sr 7:0:0:0: Attached scsi generic sg6 type 5
> Mar  3 09:11:59 zaphkiel kernel: [   28.894667] tveeprom 1-0050: has radio
> Mar  3 09:11:59 zaphkiel kernel: [   28.917194] bttv0: Hauppauge eeprom indicates model#44354
> Mar  3 09:11:59 zaphkiel kernel: [   28.984037] bttv0: tuner type=20
> Mar  3 09:11:59 zaphkiel kernel: [   29.003670] md: bind<sdb1>

and "mdadm -I sdb1"  here
> Mar  3 09:11:59 zaphkiel kernel: [   29.026123] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:07.0/input/input5
> Mar  3 09:11:59 zaphkiel kernel: [   29.098905] i2c-core: driver [msp3400] using legacy suspend method
> Mar  3 09:11:59 zaphkiel kernel: [   29.139227] AFTER udevadm settle --timeout=60

and now udev is "settled".  This is much less than 60 seconds after the
"BEFORE udevadm settle", so it didn't time out - it must have exhausted the
queue.

> Mar  3 09:11:59 zaphkiel kernel: [   29.211092] i2c-core: driver [msp3400] using legacy resume method
> Mar  3 09:11:59 zaphkiel kernel: [   29.276147] msp3400 1-0040: MSP3415D-B3 found @ 0x80 (bt878 #0 [sw])
> Mar  3 09:11:59 zaphkiel kernel: [   29.325203] md: bind<sdb2>

But look - it seems that "mdadm -I /dev/sdb2" was just run - presumably by
udev.

> Mar  3 09:11:59 zaphkiel kernel: [   29.391360] msp3400 1-0040: msp3400 supports nicam, BEFORE mdadm -IRs

We managed to get two lines blended together here, but it is clear that this
is just before "mdadm -IRs" runs.

> Mar  3 09:11:59 zaphkiel kernel: [   29.455183] mode is autodetect
> Mar  3 09:11:59 zaphkiel kernel: [   29.525553] md/raid1:md0: active with 1 out of 2 mirrors
> 

And unsurprisingly, md0 is assembled with only on mirror because we have seen
sdb1 but not sda1

> Mar  3 09:11:59 zaphkiel kernel: [   29.572395] i2c-core: driver [tuner] using legacy suspend method
> Mar  3 09:11:59 zaphkiel kernel: [   29.642012] i2c-core: driver [tuner] using legacy resume method
> Mar  3 09:11:59 zaphkiel kernel: [   29.688786] md: bind<sda4>

and here comes sda4.  Presumably sda1 is attempted a little later, but as md0
is already assembled, mdadm refuses to do anything with it.

So "mdadm -IRs" is being run too early, apparently because "udevadm settle" is
exiting too early.  However I have looked closely at the udev code, and run
various tests, and cannot see how it could possibly exit early.  I can
imagine it exiting a bit late in some anomalous situations, but not early.

The only way I can explain this is to suggest that something other than udev
is running "mdadm -I".  But I cannot imagine what would be doing that.

If you put a "sleep 10" before the "mdadm -IRs", I suspect your problems
would go away.  It isn't really a nice solution, but it is the only one I can
think of at the moment.

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux