[Bug 151491] free space lossage on busy system with bigalloc enabled and 128KB cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=151491

Eric Whitney (enwlinux@xxxxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |enwlinux@xxxxxxxxx

--- Comment #10 from Eric Whitney (enwlinux@xxxxxxxxx) ---
I've been able to reproduce the reported problem on my test system running a
4.14 x86-64 kernel with the supplied test script.  Thanks for supplying it!

The block reporting errors from du and df are likely caused by delayed
allocation accounting bugs.  Experiments with an instrumented kernel show that
the number of delayed allocated blocks is occasionally overcounted as the test
files are physically allocated, leaving a residual value behind once allocation
is complete.  This residual value remains once a file has been fully written
out or deleted, and distorts the results reported by du or df.  Interestingly,
the overcounting isn't deterministic and varies from run to run.

Part of the overcounting appears due to code in ext4_ext_map_blocks() that
increases i_reserved_data_blocks when new clusters are allocated.  This code
has been previously implicated in other observed failures and in this case
appears to contribute some but not always all of the overcounted clusters seen
when running the test script.  Kernel traces indicate that there is usually
another as yet unknown contributor to the overcount.

Ted has suggested a temporary workaround which can be used to avoid the
reported problems, though it may have a significant workload-dependent
performance impact.  Delayed allocation can simply be disabled by using the
nodelalloc mount option.  I've tested this with repeated runs of the supplied
test script, and it avoids the reported problems as expected.

Reverting "ext4: don't release reserved space for previously allocated cluster"
(9d21c9fs2cc2) isn't an attractive option because doing so would expose users
to potential data loss.  The purpose of the patch was to fix cases where the
number of outstanding delayed allocation blocks were undercounted. 
Undercounting can lead to unexpected free space exhaustion at writeback time,
among other things.

I'll see what more I can learn from some additional experimentation.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux