https://bugzilla.kernel.org/show_bug.cgi?id=45741 Theodore Tso <tytso@xxxxxxx> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tytso@xxxxxxx --- Comment #1 from Theodore Tso <tytso@xxxxxxx> 2012-08-09 18:10:59 --- It's not scanning every single inode (that would take a lot longer!), but it is scanning every single block allocation bitmap. The problem is that we know how many free blocks are in a block group, but we don't know the distribution of the free blocks. The distribution (there X blocks of size 2**3, Y blocks of size 2**4, etc.) is cached in memory, but the first time you unmount and mount the file system, we need to read in the block bitmap for a block group. Normally, we only do this until we find a suitable group, but when the file system is completely full, we might need to scan the entire disk. I've looked at mballoc, and there are some things we can fix on our side. We're reading in the block bitmap without first checking to see if the block group is completely filled. So that's an easy fix on our side, which will help at least somewhat. So thanks for for reporting this. That being said, it's a really bad idea to try to use a file system to 99%. Above 80%, the file system performance definitely starts to fall off, and by the time you get up to 95%, performance is going to be really awful. There are definitely things we can do to improve things, but ultimately, it's something that you should plan for. You could also try increasing the flex-bg size, which is a configuration knob when the file system is formatted. This collects allocation bitmaps for adjacent block groups together. The default is 16, but you could try bumping that up to 64 or even 128. It will improve the time needed to scan all of the allocation bitmaps in the cold cache case, but it may also decrease performance after that, when you need to allocate and delalocate inodes and blocks, and by increasing the distance from data blocks to the inode table. How much this tradeoff will work is going to be very dependent on the details of your workload. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html