ext4 df regression introduced by commit 9d0be50

Bruce Guenter <bruce@xxxxxxxxxxxxxx> · Mon, 10 May 2010 12:05:31 -0600

Hi.

I think I have found a regression introduced by commit 9d0be50 "ext4:
Calculate metadata requirements more accurately".

I am using ext4 on a NFSv4 server running unpatched kernel 2.6.33.3.
The client is currently running unpatch 2.6.33.3, although I also saw
the problem with the client running 2.6.32.10.

The output from 'df' on the client varies wildly in the presence of
certain writes.  I have not pinned down an exact write pattern that
causes it, but I do have an application that causes it fairly reliably.
When the bug happens, I see swings like this:

Sun May  9 23:04:58 2010 blocks=961173888 available=28183168
Sun May  9 23:04:59 2010 blocks=961173888 available=12823424
Sun May  9 23:05:00 2010 blocks=961173888 available=28183040

(produced by a script that checks statvfs output every second; units are
kB, df output is effectively identical)

There is no possible way this system could write and then erase 15GB of
disk space in a second, as the drive can sustain only about 40MB/sec.
This problem is not present in any of the 2.6.32.* kernels.  I used git
bisect to narrow down the range to between 3e8d95d (good) and 741f21e8
(bad) before I gave up because the kernels would oops before I could
test.  There are 2 ext4 patches in that range, 9d0be50 and ee5f4d9.  The
other patches are S390 and SH arch fixes.

I checked out a copy of 2.6.33.3 and reverted commit 9d0be50.  There was
a small conflict in fs/ext4/inode.c which I hand merged.  The resulting
kernel has not exhibited the problem in an hour of testing, where
previously I could trigger it in a minute or two.

If you need more information, I can gladly provide it.

-- 
Bruce Guenter <bruce@xxxxxxxxxxxxxx>                http://untroubled.org/
Attachment:
pgpT2x6VrLtM1.pgp

Description: PGP signature