On Mon, 4 Jul 2011 23:36:39 +0800 Kiefer Chang <zapchang@xxxxxxxxx> wrote: > Dear Tomonori, > > We got segfault error on heavy I/O. Hope you can give some suggestion. > > [Setting] > 7 machines, each machine runs a VM and each VM uses 10 targets on > tgtd. Machine equips 1GB cards. > So there will be at least 70+ volumes on tgtd. > > The tgtd (1.0.16) is running on a machine with two 10GBe cards bonded. > For setting up backing store of target, LVM logical volumes are used. > (Physical volume is on software RAID 5) > > Both initiator side and target side are running CentOS 5.4. > > I tried to setting up the system so core-dump can be generated when > problem hit. The core dump file seems incomplete, file is 8G+ bigger, > but only use about 30~50M disk capacity. > > So I try to use gdb to attach to a debug build (make DEBUG=1) of tgtd. > (The symptom is much easier to be reproduced during heavy I/O test and > with optimized build of tgtd (-o2).) > When symptom shows, I got the following backtraces: (only the latest > part is pasted) > ============ > .. > [New Thread 0x2aabbaa5d940 (LWP 20176)] > [New Thread 0x2aabbb45e940 (LWP 20177)] > [New Thread 0x2aabbbe5f940 (LWP 20227)] > [New Thread 0x2aabbc860940 (LWP 20228)] > [New Thread 0x2aabbd261940 (LWP 20229)] > [New Thread 0x2aabbdc62940 (LWP 20230)] > [New Thread 0x2aabbe663940 (LWP 20258)] > [New Thread 0x2aabbf064940 (LWP 20259)] > [New Thread 0x2aabbfa65940 (LWP 20265)] > [New Thread 0x2aabc0466940 (LWP 20266)] > > Program received signal SIGSEGV, Segmentation fault. > 0x000000000040889d in iscsi_data_out_rx_start (conn=0x10f26028) at > iscsi/iscsid.c:1524 > 1524 if (task->tag == req->itt) > (gdb) bt > #0 0x000000000040889d in iscsi_data_out_rx_start (conn=0x10f26028) at > iscsi/iscsid.c:1524 > #1 0x0000000000409360 in iscsi_task_rx_start (conn=0x10f26028) at > iscsi/iscsid.c:1729 > #2 0x0000000000409d42 in iscsi_rx_handler (conn=0x10f26028) at > iscsi/iscsid.c:1986 > #3 0x0000000000411ba6 in iscsi_tcp_event_handler (fd=445, events=5, > data=0x10f26028) at iscsi/iscsi_tcp.c:158 > #4 0x0000000000417365 in event_loop () at tgtd.c:454 > #5 0x0000000000417a16 in main (argc=1, argv=0x7fffd5eb9a98) at tgtd.c:640 > (gdb) > > (gdb) print task > $5 = (struct iscsi_task *) 0xffffffffffffff90 > (gdb) print req > $6 = (struct iscsi_data *) 0x10f26148 > (gdb) > > (gdb) p task->req > Cannot access memory at address 0xffffffffffffff90 > (gdb) p task->rsp > Cannot access memory at address 0xffffffffffffffc0 > (gdb) p task->tag > Cannot access memory at address 0xfffffffffffffff0 > > > (gdb) p req->opcode > $30 = 5 '\005' > (gdb) p req->flags > $31 = 128 '\200' > (gdb) p req->rsvd2 > $32 = "\000" > > ============ > > The system log can be downloaded from here: > http://dl.dropbox.com/u/8354750/tgtd/20110704/messages > > Seems *task* is freed and referenced again. This is related with tmf (aborting task, etc)? Your next report is. -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html