On 5 Mar 2007, at 00:32, Anton Altaparmakov wrote:
On 5 Mar 2007, at 00:16, Jörn Engel wrote:
On Sun, 4 March 2007 14:38:13 -0800, Ulrich Drepper wrote:
When you do it like this, who can the kernel/filesystem
*guarantee* that
when the data is written there actually is room on the harddrive?
What you described seems like using truncate/ftruncate to
increase the
file's size. That is not at all what posix_fallocate is for.
posix_fallocate must make sure that the requested blocks on the
disk are
reserved (allocated) for the file's use and that at no point in the
future will, say, a msync() fail because a mmap(MAP_SHARED) page has
been written to.
That actually causes an interesting problem for compressing
filesystems.
The space consumed by blocks depends on their contents and how
well it
compresses. At the moment, the only option I see to support
posix_fallocate for LogFS is to set an inode flag disabling
compression,
then allocate the blocks.
But if the file already contains large amounts of compressed data, I
have a problem. Disabling compression for a range within a file
is not
supported, so I can only return an error. But which one?
I don't know how your compression algorithm works but at least on
NTFS that bit is easy: you allocate the blocks and mark them as
allocated then the compression engine will write non-compressed
data to those blocks. Basically it works like this "does
compression block X have any sparse blocks?". If the answer is
"yes" the block is treated as compressed data and if the answer is
"no" the block is treated as uncompressed data. This means that if
the data cannot be compressed (and in some cases if the data
compressed is bigger than the data uncompressed) the data is stored
non-compressed. That is the most space efficient method to do things.
An alternative would be to allocate blocks and then when the data
is written perform the compression and free any blocks you do not
need any more because the data has shrunk sufficiently. Depending
on the implementation details this could potentially create
horrible fragmentation as you would allocate a large consecutive
region and then go and drop random blocks from that region thus
making the file fragmented.
And another thing you could do (best if you support journalling)
would be to do the allocation and hang the details off the inode on a
"preallocation list" of some kind and then as the data gets written
use blocks from the preallocation list as you go along. This would
avoid the fragmentation issue for example. You could then free the
surplus blocks when the whole range of the file being covered by the
preallocation list has been written to and/or when the file is closed
for the last time (drop_inode/delete_inode).
Best regards,
Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html