On 11/03/2014 08:30 PM, Hugh Dickins wrote:
On Wed, 29 Oct 2014, Josef Bacik wrote:
One of the rocksdb people noticed that when you do something like this
fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
pwrite(fd, buf, 5M, 0)
ftruncate(5M)
on tmpfs the file would still take up 10M, which lead to super fun issues
because we were getting ENOSPC before we thought we should be getting ENOSPC.
This patch fixes the problem, and mirrors what all the other fs'es do. I tested
it locally to make sure it worked properly with the following
xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file
Without the patch we have "Blocks: 20480", with the patch we have the correct
value of "Blocks: 10240". Thanks,
Signed-off-by: Josef Bacik <jbacik@xxxxxx>
That is a very good catch, and thank you for the patch. But I am not
convinced that the patch is correct - even if it does happen to end
up doing what other filesystems do here (I haven't checked).
Your patch makes it look like a fix to an off-by-one, but that is
not really the case. What if you change your final ftruncate(5M)
to ftruncate(6M): what should happen then?
My intuition says that what should happen is that i_size is set to 6M,
and the fallocated excess blocks beyond 6M be trimmed off: so that
it's both an extending and a shrinking truncate at the same time.
And I think that behavior would be served by removing the
"if (newsize < oldsize)" condition completely.
But perhaps I'm wrong: can you or anyone shed more light on this,
or point to documentation of what should happen in these cases?
Yup there's a section in the ftruncate manpage that specifically says
"expanding truncate is for losers."
Dave you want to weigh in here? Looking at both btrfs and xfs we only
do the trimming if newsize <= oldsize. So if you falloc up to 10M,
write 5M, and truncate up to 6M there is no trimming. I'd say this is
ok since it's an expanding truncate, people doing this are probably
going to want to keep the extra space, as opposed to those who falloc a
chunk and then truncate down to the amount they actually wrote. Thoughts?
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html