On Sun, 03 Mar 2013 20:31:57 +0100 Hans-Peter Jansen <hpj@xxxxxxxxx> wrote: > Am Donnerstag, 28. Februar 2013, 23:16:01 schrieb Hans-Peter Jansen: > > Am Freitag, 1. März 2013, 08:25:20 schrieb NeilBrown: > > > On Thu, 28 Feb 2013 11:49:53 +0100 Hans-Peter Jansen <hpj@xxxxxxxxx> > > > wrote: > > > > > > This is what I asked for: > > > > > It really feels like a udev bug, but it is hard to be sure. > > > > > Could you edit /etc/init.d/boot.md and add > > > > > > > > > > echo BEFORE mdadm -IRs > /dev/kmsg > > > > > > > > > > just before the call the "$mdadm_BIN -IRs", > > > > > > > > > > echo AFTER mdadm -IRs > /dev/kmsg > > > > > > > > > > just after that call, and > > > > > > > > > > echo AFTER mdadm -Asc > /dev/mksg > > > > > > > > > > just after the call to "$mdadm_BIN -A -s -c $mdadm_CONFIG" > > > > > > > > > > And maybe put similar messages just before and after the > > > > > > > > > > /sbin/udevadm settle .... > > > > > > > > > > call. > > > > > > > > > > If you can then reproduce the problem, the extra logging might tell us > > > > > something useful. > > > > > > This is not at all the same thing: > > > > I've enabled initrd (linuxrc=trace) and boot.md (#!/bin/sh -x) > > > > debugging, > > > > hence we should see the full story next time. > > > > > > It might help, but I'm far from certain that it will be nearly as useful. > > I see, sh debug messages do not condense in /var/log/something, they're routed > somewhere else. I not sure, how I like that systemd implied sillyness. > > > Okay, okay, I do it both ways now.. > > > > Let's see, when it triggers. > > I'm not sure, how the basic kernel boot contribute to boiling this issue down, > but here we go: (not censored this time, hope it will go though..) Thanks. The interesting bit is here: > Mar 3 09:11:59 zaphkiel kernel: [ 28.781703] md: bind<sdb4> It seems that udev is running "mdadm -I" on sdb4 here > Mar 3 09:11:59 zaphkiel kernel: [ 28.799939] tveeprom 1-0050: audio processor is MSP3415 (idx 6) > Mar 3 09:11:59 zaphkiel kernel: [ 28.862699] sr 7:0:0:0: Attached scsi generic sg6 type 5 > Mar 3 09:11:59 zaphkiel kernel: [ 28.894667] tveeprom 1-0050: has radio > Mar 3 09:11:59 zaphkiel kernel: [ 28.917194] bttv0: Hauppauge eeprom indicates model#44354 > Mar 3 09:11:59 zaphkiel kernel: [ 28.984037] bttv0: tuner type=20 > Mar 3 09:11:59 zaphkiel kernel: [ 29.003670] md: bind<sdb1> and "mdadm -I sdb1" here > Mar 3 09:11:59 zaphkiel kernel: [ 29.026123] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:07.0/input/input5 > Mar 3 09:11:59 zaphkiel kernel: [ 29.098905] i2c-core: driver [msp3400] using legacy suspend method > Mar 3 09:11:59 zaphkiel kernel: [ 29.139227] AFTER udevadm settle --timeout=60 and now udev is "settled". This is much less than 60 seconds after the "BEFORE udevadm settle", so it didn't time out - it must have exhausted the queue. > Mar 3 09:11:59 zaphkiel kernel: [ 29.211092] i2c-core: driver [msp3400] using legacy resume method > Mar 3 09:11:59 zaphkiel kernel: [ 29.276147] msp3400 1-0040: MSP3415D-B3 found @ 0x80 (bt878 #0 [sw]) > Mar 3 09:11:59 zaphkiel kernel: [ 29.325203] md: bind<sdb2> But look - it seems that "mdadm -I /dev/sdb2" was just run - presumably by udev. > Mar 3 09:11:59 zaphkiel kernel: [ 29.391360] msp3400 1-0040: msp3400 supports nicam, BEFORE mdadm -IRs We managed to get two lines blended together here, but it is clear that this is just before "mdadm -IRs" runs. > Mar 3 09:11:59 zaphkiel kernel: [ 29.455183] mode is autodetect > Mar 3 09:11:59 zaphkiel kernel: [ 29.525553] md/raid1:md0: active with 1 out of 2 mirrors > And unsurprisingly, md0 is assembled with only on mirror because we have seen sdb1 but not sda1 > Mar 3 09:11:59 zaphkiel kernel: [ 29.572395] i2c-core: driver [tuner] using legacy suspend method > Mar 3 09:11:59 zaphkiel kernel: [ 29.642012] i2c-core: driver [tuner] using legacy resume method > Mar 3 09:11:59 zaphkiel kernel: [ 29.688786] md: bind<sda4> and here comes sda4. Presumably sda1 is attempted a little later, but as md0 is already assembled, mdadm refuses to do anything with it. So "mdadm -IRs" is being run too early, apparently because "udevadm settle" is exiting too early. However I have looked closely at the udev code, and run various tests, and cannot see how it could possibly exit early. I can imagine it exiting a bit late in some anomalous situations, but not early. The only way I can explain this is to suggest that something other than udev is running "mdadm -I". But I cannot imagine what would be doing that. If you put a "sleep 10" before the "mdadm -IRs", I suspect your problems would go away. It isn't really a nice solution, but it is the only one I can think of at the moment. NeilBrown
Attachment:
signature.asc
Description: PGP signature