Re: Periodic lockup problem with ext3

Andreas Dilger <adilger@clusterfs.com> · Wed, 9 Oct 2002 09:45:41 -0600

On Oct 09, 2002  17:17 +0200, Ralf Hildebrandt wrote:
> On several machines with ext3 we have a periodic "unresponsiveness"
> problem. 
> 
> Take for example our mailserver: When it handles a lot of
> email (lots of deliveries to Maildirs), it shovels the data into the
> Maildirs. 
> 
> But every now and then (the interval being >> 5s, the commit
> interval) the machine becomes unresponsive, your hear a lot of disk
> activity, and after about 12-18s the machine is back to normal.
> Nobody knows what's happening during that phase.

This happens when the journal becomes full (and probably also has lots
of pending data buffers to flush, for data=ordered) and it must flush
the journal before any more filesystem activity can occur.  The way to
solve this is to have a flush interval which is shorter than the time
it takes to fill the journal.

Either decrease the flush interval so that less data is outstanding at
any time, or increase the size of the journal so that you always have
enough journal space for at least 30 seconds of changes, to allow the
data buffers to be flushed in the background.

Since the latter is not very practical (it might involve huge journals,
and correspondingly more data loss on a crash, unless you are running
with data=journal and your application is syncing all I/O), the
preferred method is simply to reduce the flush interval.  There was a
patch from AKPM which allows setting this on a per-mount basis, please
check the list archives.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

_______________________________________________

Ext3-users@redhat.com
https://listman.redhat.com/mailman/listinfo/ext3-users