On Tue, Feb 05, 2013 at 11:08:52PM -0500, Tom wrote: > In a previous message, Dave Chinner wrote: > > > > Find out if the unmount is returning an error first. If there is no > > error, then you need to find what is doing bind mounts on your > > system and make sure they are unmounted properly before the final > > unmount is done. If lazy unmount is being done, make it a normal > > unmount an see where the unmount is getting stcuk or taking time to > > complete by using sysrq-w if it gets delayed for any length of time. > > OK, here is what I did tonight. I added debug toward the end of > /etc/rc.d/rc6.d/S01reboot ...where the umounts are normally handled. > DEBUG: remounting '/' as read-only using 'mount -n -o ro,remount' > DEBUG: remounting '/proc' as read-only using 'mount -n -o ro,remount' > mdadm: failed to set readonly for /dev/md3: Device or resource busy EBUSY means one of two possibilities: 1. there's a file still open for write. => lsof 2. there's an unlinked but still open file => lsof But I don't think that's the problem at all. > Please stand by while rebooting the system... > md: stopping all md devices. > md: md2 switched to read-only mode. > md: md1 switched to read-only mode. > (hang) > > Just for kicks, I get the same output with the 308 kernel, with the > addition of this: > > md: md3 still in use. Which implies that the problem is a change in behaviour in the md layer or below. i.e. previously md just saw that it was busy and did not try to tear down the device. Now it is trying to tear down the device with a filesystem that is still active on it. > But the same system happily reboots just fine with the 308 kernel even > after producing that "still in use" message that 348 does not produce. Right, because it correctly detects the filesystem is still in use and doesn't try to tear down the device. > I did some more experiments with mdadm and I can't get any underlying > md device to go into read-only mode even if the fs is mounted read-only. > The only way I could get that to work is if the fs is completely unmounted. > Whether it is XFS or ext3. Yet a system on ext3 reboots fine. And that will be because ext3 won't be issuing any IO on the sync that is triggered when tearing down the MD device. XFS is writing the superblock, and that's where the MD device is hanging on itself. > Is there more specific information that I can gather that may help? No need - I can tell you the exact commit in the RHEL 5.9 tree that caused this regression: 11ff4073: [md] Fix reboot stall with raid on megaraid_sas controller The result is that the final shutdown of md devices now uses a "force readonly" method, which means it ignores the fact that a filesystem may still be active on top of it and rips the device out from under the filesystem. This really only affects root devices, and given that XFs is not supported as a root device on RHEL, it isn't in the QE test matrix and so the problem was never noticed. Feel free to report this all to the RH bugzilla - depending the implications of the regression for supported configurations, it may need to be fixed in RHEL anyway. But now you know the problem, you can probably fix it yourself rather than have to wait for RHEL/CentOS product cycle updates... Cheers, Dave. PS: has the fact I quoted a RHEL5.9 commit id triggered a lightbulb moment for you yet? Hint: my other email address is dchinner@xxxxxxxxxx - this XFS community support effort was brought to you by Red Hat. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs