More ext3 fileserver woes ...

neilb@cse.unsw.edu.au (Neil Brown) · Thu, 6 Jun 2002 10:14:52 +1000 (EST)

Well....  you might remember that I have had problems will my NFS
fileserver that run ext3 with data=journal.

The filesystem corruption now seems too be solved with the patch (plus
amendment) that I posted, so I am happy about that... but there is
more.

I have known for a while that ext3 doesn't behave very well when the
journal fills up.  If it finds that the journal is full, and the
oldest transaction still has dirty buffers, the checkpoint code
(log_do_checkpoing) will, despite comments to the contrary, flush out
*all* dirty buffers that are attached to the journal, and will wait
for all of them to be written to disc.

This can cause my fileserver to pause for a number of seconds while
the journal empties, which is very noticable to my customers.

There are (were) two easy workarounds for this.
1/ make the journal big enough that it will never fill.
2/ make the bdflush parameters small enough that data gets flushed out
  quickly enough that the journal never fills.

I have chosen the second and it works quite well... until now.

As part of fixing the corruption problem and the assertion failure
before that I have upgraded my fileservers to 2.4.18-pre9 plus
ext3-0.9.18.  Now the above work arounds don't work any more.

I periodically have my fileserver hang for 30 to 60 seconds while
there is lots of disc activity.  It looks very much like the journal
being flushed.
Back of the envelop calculations suggest that the journal on one of
these machines would take 10 minutes to fill under a very high (but
realistic) load.  Under normal load I would expect longer.

This seems to suggest that there are dirty buffers lying around that
are more than 10 minutes old.  Given that bdf_prm.b_un.age_buffer is
the default 3000, or 30 seconds, this seems like a problem.

I have resolved the problem for now by running "sync" every minute
which seems to work, but is hardly elegant.

I'm not in a position at the moment to spend time testing whether the
inefficacy of bdflush is ext3 specific or applies to ext2 as well.  I
may try that next week (we have a long weekend coming up).  If it
applies to ext2, it could be a showstopped for 2.4.19....

NeilBrown