Hi, I wasn't sure whether to send this to LKML or here, but DLM seems involved. Please let me know if I'd better repost it to somewhere else. It's a vanilla 2.6.21 kernel patched by cluster-2.00.00 (with the three extra export for GFS1). Config attached. The machine froze during the morning updatedb cronjob, which performed a recursive find into the shared GFS filesystem. Two other nodes doing the same at the same time are still up. I experienced a similar hang with cluster-1 not long ago, though that didn't lock up the whole machine, but the cluster software only. Please ask back if I didn't provide all information necessary. clvm: 2.02.26 libdevmapper: 1.02.19 openais: 0.80.2 otherwise stock Debian Etch system. -- Regards, Feri. kernel BUG at kernel/workqueue.c:212! invalid opcode: 0000 [#1] SMP Modules linked in: button ac battery ipv6 gfs lock_nolock lock_dlm gfs2 dlm configfs loop evdev i2c_piix4 pcspkr psmouse rtc serio_raw sworks_agp agpgart i2c_core xfs dm_mirror dm_snapshot ide_generic dm_round_robin dm_emc dm_multipath dm_mod sd_mod ide_disk ata_generic libata serverworks ohci_hcd generic qla2xxx firmware_class scsi_transport_fc scsi_mod usbcore tg3 ide_core thermal processor fan CPU: 2 EIP: 0060:[<c012f476>] Not tainted VLI EFLAGS: 00010213 (2.6.21gfs-xeon #2) EIP is at queue_work+0x2f/0x49 eax: dfb176e4 ebx: 00000002 ecx: f7e66a80 edx: dfb176e0 esi: 00000002 edi: e2bfa080 ebp: 00000000 esp: f7a91bb4 ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Process dlm_recv/2 (pid: 10261, ti=f7a90000 task=c196aa50 task.ti=f7a90000) Stack: f798d434 f7c5a980 c026dc79 ab0ee1c1 e2bfa080 dfaea000 f798d434 00200000 00000020 00000000 c1b6bd80 0101e520 e2bfa080 e2bfa080 c0272f90 000000d0 0000000e f7c5a980 00000000 00000039 00000000 00000000 00000000 00000286 Call Trace: [<c026dc79>] tcp_rcv_established+0x53a/0x7d1 [<c0272f90>] tcp_v4_do_rcv+0x28/0x2c5 [<c0275306>] tcp_v4_rcv+0x81b/0x88d [<c02957a8>] packet_rcv_spkt+0x0/0x150 [<c024035d>] dev_hard_start_xmit+0x1be/0x21d [<c025ccef>] ip_local_deliver+0x187/0x230 [<c025cb2f>] ip_rcv+0x409/0x442 [<c02958ed>] packet_rcv_spkt+0x145/0x150 [<c011b434>] __wake_up+0x32/0x43 [<c023ff15>] netif_receive_skb+0x2dc/0x350 [<f8879cfa>] tg3_poll+0x5b6/0x82f [tg3] [<c0241a00>] net_rx_action+0x9d/0x1a8 [<c012608e>] __do_softirq+0x66/0xcc [<c0126137>] do_softirq+0x43/0x51 [<c010648f>] do_IRQ+0x5c/0x71 [<c010474b>] common_interrupt+0x23/0x28 [<c0134e03>] down_read_trylock+0x10/0x1d [<f8c9d90a>] dlm_receive_message+0xa2/0xc0b [dlm] [<c023870d>] sock_common_recvmsg+0x3e/0x54 [<c02371ff>] sock_recvmsg+0xec/0x107 [<f8c9fe36>] dlm_process_incoming_buffer+0x11a/0x18c [dlm] [<f8ca3e4c>] receive_from_sock+0x124/0x217 [dlm] [<c010648f>] do_IRQ+0x5c/0x71 [<f8ca3b4e>] process_recv_sockets+0xf/0x15 [dlm] [<c012f559>] run_workqueue+0x85/0x125 [<f8ca3b3f>] process_recv_sockets+0x0/0x15 [dlm] [<c012fde7>] worker_thread+0xf9/0x124 [<c011d23f>] default_wake_function+0x0/0xc [<c012fcee>] worker_thread+0x0/0x124 [<c013248a>] kthread+0xb2/0xdc [<c01323d8>] kthread+0x0/0xdc [<c0104993>] kernel_thread_helper+0x7/0x10 ======================= Code: 64 8b 35 04 00 00 00 f0 0f ba 2a 00 19 c0 31 db 85 c0 75 2c 8d 41 08 39 41 08 8b 1d f4 94 39 c0 0f 45 de 8d 42 04 39 42 04 74 04 <0f> 0b eb fe 8b 01 f7 d0 8b 04 98 e8 34 ff ff ff bb 01 00 00 00 EIP: [<c012f476>] queue_work+0x2f/0x49 SS:ESP 0068:f7a91bb4 Kernel panic - not syncing: Fatal exception in interrupt
Attachment:
config.gz
Description: Binary data
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster