On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote: > TL;DR: we experience long temporary hangs when doing multiple mount -o > remount at the same time as other I/O on an ext4 filesystem. > > Hi, > > When starting hundreds of LXC containers simultaneously on a system, the > boot of some containers was hanging. We tracked this down to an > initscript's use of mount -o remount, which was hanging in D state. > > We reproduced the problem outside of LXC, with the script available at > [0]. That script initiates 1000 mount -o remount, and performs some > writes using a big cp to the same filesystem during the remounts. .... > Some other things we tried: > 1) we tried to 'sync' after removing the files, and dropping the caches > (as shown in the commented lines in [0]). That makes the problem disappear > (or at least makes it less frequent). The overall script execution is > actually faster with the post-rm sync and dropping caches than without > them! > > 2) We tried switching to the noop scheduler (instead of cfq). The problem > could still be reproduced. A btrace dump with noop is available at [2]. > > 3) We tried with ext3 instead of ext4. The problem could never be > reproduced. > > 4) We tried on different machines, and we could reproduce the problem. > However, on a machine with SSD drives, we were not able to reproduce the > problem. > > Any ideas? If this really is caused by sync on ext4 being slow while there are concurrent writers, then perhaps: http://marc.info/?l=linux-ext4&m=139388721931428&w=2 is a possible fix... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html