Theodore Tso wrote: > > On Fri, Mar 01, 2002 at 02:15:24PM -0800, Andrew Morton wrote: > > > > Mail produces "slow-growth" files. Which means that their blocks > > are sprinkled all over the disk. If you're adding a few k per hour to > > a file, the fs just about never manages to allocate the blocks > > contiguously. A while back, I had a six-month-old multi-megabyte > > mailbox which had precisely *zero* contiguous blocks. It was 100% > > fragmented! > > Yeah, we really need to get preallocation working again for ext3, I have half-a-patch for that. It takes the preallocation out of the bitmaps altogether, and puts it into (start_block, nr_blocks) in the inode instead. Which has the advantage that prealloc doesn't stumble over stray already-used blocks. And the prealloc window can be grown dynamically, like readahead. To larger values. Without requiring tricky changes to the journalling, and does not need to differ from an ext2 implementation. I'll finish that off reasonably soon, I think. I was for a while hoping that delayed allocation would suffice to solve the problem. And indeed it does. But it's too big for 2.4 - much too big. > and > it would be useful if the filesystem could notice the mail case, and > to not release the preallocated blocks back to the system when the > file descriptor is closed. mm. Allocate-on-flush partially solves this. Dropping the preallocation at the right time is absolutely vital for the many-small-file workloads. > > For the above reasons, I partition my machines with all partitions > > the same size, and keep one free. For the monthly theraputic > > copy-all-files-and-switch-mountpoints speedup. > > > > It's all a bit sad, really. > > Well, perhaps it time that someone rewrote the defragger to work with > 4k blocks, and so that it doens't leave your filesystem a smoking heap > of debris if your system crashes in the middle of the defrag > operation. :-) I have 100%-journalled pagecache-coherent online defrag code sitting here. Haven't quite gotten around to designing the userspace bit yet. :( `cp -a' does the job. > I haven't really noticed a major slowdown effect, but that's probably > because I was used to speed of using emacs RMAIL, and for large mail > files, mutt is blazingly fast in comparison, fragmented files or no. > > As always, there's always more work to that we could do to make things > better, and not enough time to do it. :-) The algorithm for placing directory inodes is the biggest performance problem in ext2 and ext3. I did a truckload of work on that last year. I ended up concluding that we need online defrag, which will enable the placement of directory inodes in the same block group as their parent. We're talking a 5x speedup for some common workloads here. -