Hello, On Tue, Dec 03, 2013 at 08:28:43AM -0600, Josh Hunt wrote: > You're right. Thanks for pointing this out. I did not realize there > was a bug in the init script. The version of initramfs-tools I was > using had the following bug: > https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1215911 > > Updating to 0.99ubuntu13.4 of initramfs-tools resolved my boot hangs. > > I did try using the workaround as suggested by Linus. In my setup the > dm_init() code was hit, however it still appeared to be too late at > times. I also tried moving the call to async_synchronize_full() above > the for loop and it still had the same issue (patch attached.) Out of > around 10 reboot tests it failed to find root 1 or 2 times. > > The ubuntu scripts don't ever actually call do_mount() if it can't > find the device. It seems to rely on some udev functionality to tell > it when the device is present, and if that fails it just bails out. > > This change has introduced a regression. However, I only noticed it > b/c my init script had a bug which caused it not to wait around for > the device to appear. Hmmm.... so, read the bug report, digged and asked around a bit. Here's the root problem - ubuntu's initramfs uses a tool to wait for the root device which uses libudev to listen for the device event; unfortunately, its rx buffer is not set large enough and the receiver isn't fast enough, which means that netlink broadcast messages from the kernel can overrun the buffer. When that happens, it sets an error on the socket, so the next recv fails with -ENOBUFS. If that happens, the wait for root aborts immediately and initramfs proceeds to mount non-existent root device. The only thing which changes by these patches is the timing of events. The problem likely wasn't as exposed before because things were slow enough so that either the messages could be consumed fast enough or there's enough delay between libata module load and the root device wait hiding the bug in the wait logic. So, yeah, it's a full blown timing bug. I'm not sure what we can do to work around from kernel side except for randomly slowing things down or forcefully enlarging rx buffer size. There really is no interlocking to take advantage of. :( Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html