Hi, On 1/15/25 17:34, Andreas Wagner wrote:
1. Do they really - as media and Wikipedia suggest - reduce allocated filesize if the file is sparse?
Yes. Unless explicitly requested, blocks are allocated to a file when they are first written, and the file size updated at the same time. So when you open a file and write 8192 bytes to it, on a file system with 4096 byte blocksize, the write operation will allocate two blocks and set the file size to 8192.
If you seek to an offset of 8192 before writing, then the write will still allocate two blocks to hold the data, but the file size is set to 16384, and the first 8192 bytes read back as zeros. Because no blocks have been allocated for this, the allocated file size is now smaller than the visible file size, and you have created a sparse file.
So there is no special procedure involved in creating a sparse file, it just happens if you're not writing the file from start to finish.
2. Are the blocks of the file in random order?
Because block allocation happens when the data is written, there is no guarantee that consecutive data in the file is stored in consecutive blocks on disk, regardless of whether the file is sparse or not.
If you use a file as backing storage for a virtual machine's harddisk, the access patterns involved will likely create a sparse file with suboptimal on-disk layout, but the only way to avoid this is to allocate the blocks before they are used.
If you want a good chance of getting consecutive blocks, you can use posix_fallocate(3) to request that file system blocks be allocated at once. This will grow the file to the requested size immediately (you cannot have allocated but inaccessible blocks), and since it is a single request, there is little chance of interference from other file system accesses that may also want to allocate blocks at the same time.
There is no guarantee though. Only LVM exposes allocation policies, if your desired performance or some other constraint requires contiguous allocation, you need to use a logical volume. For typical use cases, using fallocate is sufficient to produce a minimal number of allocations.
Simon