Re: [PATCH] generic/015: Change the test filesystem size to 101mb

Timofey Titovets <nefelim4ag@xxxxxxxxx> · Mon, 8 Jan 2018 16:55:01 +0300

2018-01-08 15:54 GMT+03:00 Qu Wenruo <quwenruo.btrfs@xxxxxxx>:
>
>
> On 2018年01月08日 16:43, Nikolay Borisov wrote:
>> This test has been failing for btrfs for quite some time,
>> at least since 4.7. There are 2 implementation details of btrfs that
>> it exposes:
>>
>> 1. Currently btrfs filesystem under 100mb are created in Mixed block
>> group mode. Freespace accounting for it is not 100% accurate - I've
>> observed around 100-200kb discrepancy between a newly created filesystem,
>> then writing a file and deleting it and checking the free space. This
>> falls within %3 and not %1 as hardcoded in the test.
>>
>> 2. BTRFS won't flush it's delayed allocation on file deletion if less
>> than 32mb are deleted. On such files we need to perform sync (missing
>> in the test) or wait until time elapses for transaction commit.
>
> I'm a little confused about the 32mb limit.
>
> My personal guess about the reason to delay space freeing would be:
> 1) Performance
>    Btrfs tree operation (at least for write) is slow due to its tree
>    design.
>    So it may makes sense to delay space freeing.
>
>    But in that case, 32MB may seems to small to really improve the
>    performance. (Max file extent size is 128M, delaying one item
>    deletion doesn't really improve performance)
>
> 2) To avoid later new allocation to rewrite the data.
>    It's possible that freed space of deleted inode A get allocated to
>    new file extents. And a power loss happens before we commit the
>    transaction.
>
>    In that case, if everything else works fine, we should be reverted to
>    previous transaction where deleted inode A still exists.
>    But we lost its data, as its data is overwritten by other file
>    extents. And any read will just cause csum error.
>
>    But in that case, there shouldn't be any 32MB limit, but all deletion
>    of orphan inodes should be delayed.
>
>    And further more, this can be addressed using log tree, to log such
>    deletion so at recovery time, we just delete that inode.
>
> So I'm wonder if we can improve btrfs deletion behavior.
>
>
>>
>> Since mixed mode is somewhat deprecated and btrfs is not really intended
>> to be used on really small devices let's just adjust the test to
>> create a 101mb fs, which doesn't use mixed mode and really test
>> freespace accounting.
>
> Despite of some btrfs related questions, I'm wondering if there is any
> standard specifying (POSIX?) how a filesystem should behave when
> unlinking a file.
>
> Should the space freeing be synchronized? And how should statfs report
> available space?
>
> In short, I'm wondering if this test and its expected behavior is
> generic enough for all filesystems.
>
> Thanks,
> Qu
>
>>
>> Signed-off-by: Nikolay Borisov <nborisov@xxxxxxxx>
>> ---
>>  tests/generic/015 | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tests/generic/015 b/tests/generic/015
>> index 78f2b13..416c4ae 100755
>> --- a/tests/generic/015
>> +++ b/tests/generic/015
>> @@ -53,7 +53,7 @@ _supported_os Linux
>>  _require_scratch
>>  _require_no_large_scratch_dev
>>
>> -_scratch_mkfs_sized `expr 50 \* 1024 \* 1024` >/dev/null 2>&1 \
>> +_scratch_mkfs_sized `expr 101 \* 1024 \* 1024` >/dev/null 2>&1 \
>>      || _fail "mkfs failed"
>>  _scratch_mount || _fail "mount failed"
>>  out=$SCRATCH_MNT/fillup.$$
>>
>

All fs, including btrfs (AFAIK) return unlink(), (if file not open)
only then space has been freed.
So free space after return of unlink() must be freed.
Proofs: [1] [2] [3]
[4] - Posix, looks like do not describe behaviour.

1. http://man7.org/linux/man-pages/man2/unlink.2.html
2. https://stackoverflow.com/questions/31448693/why-system-call-unlink-so-slow
3. https://www.spinics.net/lists/linux-btrfs/msg59901.html
4. https://www.unix.com/man-page/posix/1P/unlink/

Thanks.
-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html