https://bugzilla.kernel.org/show_bug.cgi?id=151491 Eric Whitney (enwlinux@xxxxxxxxx) changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |enwlinux@xxxxxxxxx --- Comment #10 from Eric Whitney (enwlinux@xxxxxxxxx) --- I've been able to reproduce the reported problem on my test system running a 4.14 x86-64 kernel with the supplied test script. Thanks for supplying it! The block reporting errors from du and df are likely caused by delayed allocation accounting bugs. Experiments with an instrumented kernel show that the number of delayed allocated blocks is occasionally overcounted as the test files are physically allocated, leaving a residual value behind once allocation is complete. This residual value remains once a file has been fully written out or deleted, and distorts the results reported by du or df. Interestingly, the overcounting isn't deterministic and varies from run to run. Part of the overcounting appears due to code in ext4_ext_map_blocks() that increases i_reserved_data_blocks when new clusters are allocated. This code has been previously implicated in other observed failures and in this case appears to contribute some but not always all of the overcounted clusters seen when running the test script. Kernel traces indicate that there is usually another as yet unknown contributor to the overcount. Ted has suggested a temporary workaround which can be used to avoid the reported problems, though it may have a significant workload-dependent performance impact. Delayed allocation can simply be disabled by using the nodelalloc mount option. I've tested this with repeated runs of the supplied test script, and it avoids the reported problems as expected. Reverting "ext4: don't release reserved space for previously allocated cluster" (9d21c9fs2cc2) isn't an attractive option because doing so would expose users to potential data loss. The purpose of the patch was to fix cases where the number of outstanding delayed allocation blocks were undercounted. Undercounting can lead to unexpected free space exhaustion at writeback time, among other things. I'll see what more I can learn from some additional experimentation. -- You are receiving this mail because: You are watching the assignee of the bug.