Hi. I think I have found a regression introduced by commit 9d0be50 "ext4: Calculate metadata requirements more accurately". I am using ext4 on a NFSv4 server running unpatched kernel 2.6.33.3. The client is currently running unpatch 2.6.33.3, although I also saw the problem with the client running 2.6.32.10. The output from 'df' on the client varies wildly in the presence of certain writes. I have not pinned down an exact write pattern that causes it, but I do have an application that causes it fairly reliably. When the bug happens, I see swings like this: Sun May 9 23:04:58 2010 blocks=961173888 available=28183168 Sun May 9 23:04:59 2010 blocks=961173888 available=12823424 Sun May 9 23:05:00 2010 blocks=961173888 available=28183040 (produced by a script that checks statvfs output every second; units are kB, df output is effectively identical) There is no possible way this system could write and then erase 15GB of disk space in a second, as the drive can sustain only about 40MB/sec. This problem is not present in any of the 2.6.32.* kernels. I used git bisect to narrow down the range to between 3e8d95d (good) and 741f21e8 (bad) before I gave up because the kernels would oops before I could test. There are 2 ext4 patches in that range, 9d0be50 and ee5f4d9. The other patches are S390 and SH arch fixes. I checked out a copy of 2.6.33.3 and reverted commit 9d0be50. There was a small conflict in fs/ext4/inode.c which I hand merged. The resulting kernel has not exhibited the problem in an hour of testing, where previously I could trigger it in a minute or two. If you need more information, I can gladly provide it. -- Bruce Guenter <bruce@xxxxxxxxxxxxxx> http://untroubled.org/
Attachment:
pgpT2x6VrLtM1.pgp
Description: PGP signature