[Linux-cluster] Node hang

"Manuel Bujan" <bujan@xxxxxxxxxxxxxxxx> · Thu, 17 Feb 2005 14:43:59 -0500

Hello guys,

After 3 days of a heavy read/write test load one of our nodes crash with the 
following error:

Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0: fatal: 
invalid metadata block

Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0:   bh = 
13156295 (magic)

Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0:   function = 
gfs_get_data_buffer

Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0:   file = 
/usr/src/cluster/gfs-kernel/src/gfs/dio.c, line = 1328

Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0:   time = 
1108659988

Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0: about to 
withdraw from the cluster

Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0: waiting for 
outstanding I/O

Feb 17 12:06:28 atmail-1 kernel: GFS: fsid=ISQCLUSTER:gfs001.0: telling LM 
to withdraw

Feb 17 12:06:35 atmail-1 kernel: lock_dlm: withdraw abandoned memory

We are mounting our GFS partition using the noatime option, and quotas has 
been disabled in order to improve performance. The aplications currently 
running are "postfix, apache, and Courier/Imap".

We are using the CVS version available on Feb 14 around 5:00 PM.

Any light with this matter ?

Is there any way to know which file exactly was trying to read or write the 
server when it crash based on the log ?

Regards

Bujan