On Thu, May 24, 2007 at 03:51:08PM +0200, Wagner Ferenc wrote: > Hi, > > I wasn't sure whether to send this to LKML or here, but DLM seems > involved. Please let me know if I'd better repost it to somewhere > else. Here is good. > It's a vanilla 2.6.21 kernel patched by cluster-2.00.00 (with the > three extra export for GFS1). Config attached. The machine froze > during the morning updatedb cronjob, which performed a recursive find > into the shared GFS filesystem. Two other nodes doing the same at the > same time are still up. > > I experienced a similar hang with cluster-1 not long ago, though that > didn't lock up the whole machine, but the cluster software only. updatedb, even on just one node (much less all) is never going to be a good thing to run on gfs... our standard response is "don't do that". > Please ask back if I didn't provide all information necessary. I also ran into this bug last week and was testing some patches from Patrick to try to figure it out -- I got distracted with other things but will get back to it again soon. My test that hit it was doing looping mount/unmount on four nodes. Thanks for the good report. Dave > CPU: 2 > EIP: 0060:[<c012f476>] Not tainted VLI > EFLAGS: 00010213 (2.6.21gfs-xeon #2) > EIP is at queue_work+0x2f/0x49 > eax: dfb176e4 ebx: 00000002 ecx: f7e66a80 edx: dfb176e0 > esi: 00000002 edi: e2bfa080 ebp: 00000000 esp: f7a91bb4 > ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 > Process dlm_recv/2 (pid: 10261, ti=f7a90000 task=c196aa50 task.ti=f7a90000) > Stack: f798d434 f7c5a980 c026dc79 ab0ee1c1 e2bfa080 dfaea000 f798d434 00200000 > 00000020 00000000 c1b6bd80 0101e520 e2bfa080 e2bfa080 c0272f90 000000d0 > 0000000e f7c5a980 00000000 00000039 00000000 00000000 00000000 00000286 > Call Trace: > [<c026dc79>] tcp_rcv_established+0x53a/0x7d1 > [<c0272f90>] tcp_v4_do_rcv+0x28/0x2c5 > [<c0275306>] tcp_v4_rcv+0x81b/0x88d > [<c02957a8>] packet_rcv_spkt+0x0/0x150 > [<c024035d>] dev_hard_start_xmit+0x1be/0x21d > [<c025ccef>] ip_local_deliver+0x187/0x230 > [<c025cb2f>] ip_rcv+0x409/0x442 > [<c02958ed>] packet_rcv_spkt+0x145/0x150 > [<c011b434>] __wake_up+0x32/0x43 > [<c023ff15>] netif_receive_skb+0x2dc/0x350 > [<f8879cfa>] tg3_poll+0x5b6/0x82f [tg3] > [<c0241a00>] net_rx_action+0x9d/0x1a8 > [<c012608e>] __do_softirq+0x66/0xcc > [<c0126137>] do_softirq+0x43/0x51 > [<c010648f>] do_IRQ+0x5c/0x71 > [<c010474b>] common_interrupt+0x23/0x28 > [<c0134e03>] down_read_trylock+0x10/0x1d > [<f8c9d90a>] dlm_receive_message+0xa2/0xc0b [dlm] > [<c023870d>] sock_common_recvmsg+0x3e/0x54 > [<c02371ff>] sock_recvmsg+0xec/0x107 > [<f8c9fe36>] dlm_process_incoming_buffer+0x11a/0x18c [dlm] > [<f8ca3e4c>] receive_from_sock+0x124/0x217 [dlm] > [<c010648f>] do_IRQ+0x5c/0x71 > [<f8ca3b4e>] process_recv_sockets+0xf/0x15 [dlm] > [<c012f559>] run_workqueue+0x85/0x125 > [<f8ca3b3f>] process_recv_sockets+0x0/0x15 [dlm] > [<c012fde7>] worker_thread+0xf9/0x124 > [<c011d23f>] default_wake_function+0x0/0xc > [<c012fcee>] worker_thread+0x0/0x124 > [<c013248a>] kthread+0xb2/0xdc > [<c01323d8>] kthread+0x0/0xdc > [<c0104993>] kernel_thread_helper+0x7/0x10 > ======================= > Code: 64 8b 35 04 00 00 00 f0 0f ba 2a 00 19 c0 31 db 85 c0 75 2c 8d 41 08 39 41 08 8b 1d f4 94 39 c0 0f 45 de 8d 42 04 39 42 04 74 04 <0f> 0b eb fe 8b 01 f7 d0 8b 04 98 e8 34 ff ff ff bb 01 00 00 00 > EIP: [<c012f476>] queue_work+0x2f/0x49 SS:ESP 0068:f7a91bb4 > Kernel panic - not syncing: Fatal exception in interrupt -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster