>> However, I am a fan of having *default* physical >> preallocation, because as a rule one can trade off space for >> speed nowadays, and padding files with "future growth" tails >> is fairly cheap, and one could modify the filesystem code or >> 'fsck' to reclaim unused space in "future grown" tails. > When writing lots of small files (e.g. unpacking a kernel > tarball), But the ig deal here is that's not something that a filesystem targeted at high bandwidth multistreaming loads should be optimized for, at least by default. While a number of people writing to this mailing list try to use XFS as a small records DBMS, we read their silly posts here precisely because it's a stupid idea and thus it works badly for them, and so they complain. All the guys who use XFS mostly for the workloads it is good for don't complain, so we don't read about them here that much. There is simply no way to optimize for both workloads, because they require completely different strategies. BTW, '.a' files exist precisely because "many small files" was considered a bad idea for '.o' files; I wish that the original UNIX guys had used '.a' files much more widely, at the very least for '.h' files and 'man' pages and configuration files. One of the several details that the BTL guys missed (apart from 'creat' I mean :->). > leaving space at the file tail due to preallocation (that will > never get used) It will get used, e.g. by 'fsck' or the filesystem itself (let's call it 'grey' space, and say that it will be reclaimed either periodically or after all free space has been exahusted). The overall issue here when you use the future tense "will never" is that you can only have by default _adaptive_ strategies, and you need _predictive_ ones to handle well completely different workloads. That is predictive at the kernel level (and then good luck making good guesses, even if in some useful simple cases it is possible) or at the app level (and then userspace sucks, but at least it's their fault). > means that the file data is now sparse on disk instead of > packed into adjacent blocks. But even in your case of lots of small files without "tails", that doesn't work either, because you get lots of little files that are not quite guaranteed to be contiguous ot each other, and anyhow if the tails are not large, then even with tails the files can be "nearly" (most on the same track) contiguous. > The result? Instead of the elevate merging adjacent file data > IOs into a large IO, they all get issued individually, Modern IO subsystems do scatter/gather and mailboxing quite well, even SATA/SAS nowadays almost does the matter. > and the seek count for IO goes way up. Not necessarily -- because many/most files end up together on the same track/cylinder they would have ended up without tails. Anyhow for the "many small records DBMS" case, > Not truncating away the specualtive prealocation beyond EOF > will cause this, It will cause some space diffusion, but again you cannot do both multistreaming high bandwidth and singlestreaming lots of small files well, and a filesystem should not do the latter anyhow. And even so, if for example the userspace hints to the file system that sa file is very unlikely to be modified or appended to (which is the default for most apps, and could be inferred for example by no 'w' permissions or other details), the filesystem need not put tails on it. Even heuristics based on how many writes (and their interval) and/or seeks have occurred can be used to asses the need for tails. But again for the types of workloads XFS targets, one can do the brute force approach defaulting to tails, and it does not cost anywhere like this: > too, and that slows down such workloads by an order of > magnitude.... That sounds way too much. First consider that if a file is less than 2KiB in size it will have an "internal" tail (in a 4KiB) block much longer than itself already, and that's already a very bad idea for small files, as it makes for many more IOPs and IO bus bandwidth than needed, never mind the cache and RAM space. And if you are targeting things like that workload, the greatest improvement is just reducing the block size. If you do please scale accordingly the following numbers. Then suppose that each nearly 4KiB file will have a 4KiB tail; this simply reduce the density of space by a factor of around 2x (makes a bad situation only 2x worse :->). In other words leaving (for a while) "tails" on small files behave somewhat similarly (but much better than) doubling block size to 8KiB, which is not a catastrophe (the catastrophe has already happened when default block sizes became 4KiB). Unless you really want to argue that switching from 4KiB to 8KiB blocks is really going to cost 10x in performance. What I mean tails don't *have* to be done stupidly; sure there are a number of really dumb things in Linux IO (from swap space prefetching to "plugging"), but a few things seem to have been designed with some forethought, so one does not need to assume that they would be necessarily implemented that badly (and no, I don't have time, as usual, so I stay on my "armchair"). _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs