Hi, We have a cluster of two rh 2.6.7 smp machines using gfs and we exprerience random stability issues. Every 2 days or so, a lock_dlm error message is dumped to the log (see below). At this point, either both machines are unable to access the gfs file system (hanging on ls, df, ...), or a random process that was accessing a file is hanging on one of the machine (always a different process, can be tar, gzip, mv, ...) and cannot be terminated. At this point the only thing we can do is reboot both nodes. We haven't found a way to reproduce this problem, it seems to happen randomly. We have done the following to eliminate the problem (without success nor improvement): - Shutdown machine A and run all services on machine B - Shutdown machine B and run all services on machine A - Disable heavy I/O on both machines (mainly full daily backups) The error message is the following: ------ Sep 13 15:05:43 L1_OAS56_B kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000005 Sep 13 15:05:43 L1_OAS56_B kernel: printing eip: Sep 13 15:05:43 L1_OAS56_B kernel: c013a1f6 Sep 13 15:05:43 L1_OAS56_B kernel: *pde = 17aea001 Sep 13 15:05:43 L1_OAS56_B kernel: Oops: 0002 [#1] Sep 13 15:05:43 L1_OAS56_B kernel: SMP Sep 13 15:05:43 L1_OAS56_B kernel: Modules linked in: nfsd exportfs ipv6 autofs e1000 af_packet parport_pc parport ohci_hcd ehci_hcd lock_dlm dlm cman gfs lock_harness dm_mod floppy uhci_hcd usbcore thermal processor fan button battery asus_acpi ac ext3 jbd loop ide_cd cdrom qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod i2o_block i2o_core Sep 13 15:05:43 L1_OAS56_B kernel: CPU: 2 Sep 13 15:05:43 L1_OAS56_B kernel: EIP: 0060:[<c013a1f6>] Not tainted Sep 13 15:05:43 L1_OAS56_B kernel: EFLAGS: 00010083 (2.6.7) Sep 13 15:05:43 L1_OAS56_B kernel: EIP is at find_get_pages+0x41/0x5a Sep 13 15:05:43 L1_OAS56_B kernel: eax: 00000001 ebx: d6d2de4c ecx: 00000010 edx: 00000004 Sep 13 15:05:43 L1_OAS56_B kernel: esi: f274a724 edi: e00f2240 ebp: d6d2ddfc esp: d6d2dde4 Sep 13 15:05:43 L1_OAS56_B kernel: ds: 007b es: 007b ss: 0068 Sep 13 15:05:43 L1_OAS56_B kernel: Process lock_dlm (pid: 1575, threadinfo=d6d2c000 task=f7b945c0) Sep 13 15:05:43 L1_OAS56_B kernel: Stack: f274a728 d6d2de4c 00000000 00000010 d6d2de44 f274a724 d6d2de18 c01441ed Sep 13 15:05:43 L1_OAS56_B kernel: f274a724 00000000 00000010 d6d2de4c 00000000 d6d2dea0 c01444d0 d6d2de44 Sep 13 15:05:43 L1_OAS56_B kernel: f274a724 00000000 00000010 c3207870 00000000 d6d2c000 00000000 00000000 Sep 13 15:05:43 L1_OAS56_B kernel: Call Trace: Sep 13 15:05:43 L1_OAS56_B kernel: [<c0106c6b>] show_stack+0x80/0x96 Sep 13 15:05:43 L1_OAS56_B kernel: [<c0106e02>] show_registers+0x15f/0x1ae Sep 13 15:05:43 L1_OAS56_B kernel: [<c0106f77>] die+0x8d/0xfb Sep 13 15:05:43 L1_OAS56_B kernel: [<c0117e86>] do_page_fault+0x270/0x579 Sep 13 15:05:43 L1_OAS56_B kernel: [<c0106911>] error_code+0x2d/0x38 Sep 13 15:05:43 L1_OAS56_B kernel: [<c01441ed>] pagevec_lookup+0x2c/0x35 Sep 13 15:05:43 L1_OAS56_B kernel: [<c01444d0>] truncate_inode_pages+0x71/0x29f Sep 13 15:05:43 L1_OAS56_B kernel: [<fa9bdc40>] gfs_inval_buf+0x45/0x88 [gfs] Sep 13 15:05:43 L1_OAS56_B kernel: [<fa9cd06b>] inode_go_inval+0x45/0x4f [gfs] Sep 13 15:05:43 L1_OAS56_B kernel: [<fa9c9ec3>] drop_bh+0x15f/0x1d6 [gfs] Sep 13 15:05:43 L1_OAS56_B kernel: [<fa9cb4bd>] gfs_glock_cb+0x167/0x1f4 [gfs] Sep 13 15:05:43 L1_OAS56_B kernel: [<fa928ace>] process_complete+0x103/0x34c [lock_dlm] Sep 13 15:05:43 L1_OAS56_B kernel: [<fa928ee2>] dlm_async+0x1cb/0x290 [lock_dlm] Sep 13 15:05:43 L1_OAS56_B kernel: [<c0104291>] kernel_thread_helper+0x5/0xb Sep 13 15:05:43 L1_OAS56_B kernel: Sep 13 15:05:43 L1_OAS56_B kernel: Code: f0 ff 40 04 83 c2 01 39 ca 72 f2 c6 46 10 01 fb 83 c4 10 5b ------ Any idea of what's wrong or what we should we check next? Is it possible to "unlock" the machines after such an error without reboot? The release version is DEVEL.1090589850. Thanks for your help, Stéphane Messerli Senior Support & Project Engineer, Technology Europe smesserli@xxxxxxxxxxxxx 24/7 Real Media (NASDAQ: TFSM) Route de la Pierre 1024 Ecublens Switzerland tel. +41 21 695 97 46 fax +41 21 695 97 01