I tried to reproduce the symptom today, the following dead point was seen several times. [New Thread 0x2aabcae77940 (LWP 25576)] [New Thread 0x2aabcb878940 (LWP 25606)] [New Thread 0x2aabcc279940 (LWP 25610)] [New Thread 0x2aabccc7a940 (LWP 25611)] [New Thread 0x2aabcd67b940 (LWP 25612)] [New Thread 0x2aabce07c940 (LWP 25983)] [New Thread 0x2aabcea7d940 (LWP 25989)] [New Thread 0x2aabcf47e940 (LWP 25990)] [New Thread 0x2aabcfe7f940 (LWP 25991)] [New Thread 0x2aabd0880940 (LWP 26017)] [New Thread 0x2aabd1281940 (LWP 26018)] [New Thread 0x2aabd1c82940 (LWP 26019)] [New Thread 0x2aabd2683940 (LWP 26020)] [New Thread 0x2aabd3084940 (LWP 26097)] [New Thread 0x2aabd3a85940 (LWP 26112)] [New Thread 0x2aabd4486940 (LWP 26113)] [New Thread 0x2aabd4e87940 (LWP 26114)] [New Thread 0x2aabd5888940 (LWP 26135)] [New Thread 0x2aabd6289940 (LWP 26136)] [New Thread 0x2aabd6c8a940 (LWP 26137)] [New Thread 0x2aabd768b940 (LWP 26138)] Program received signal SIGSEGV, Segmentation fault. 0x000000000041c5b7 in abort_task_set (mreq=0x115eef00, target=0x103be510, itn_id=2478, tag=805306479, lun=0x0, all=0) at target.c:1155 1155 list_for_each_entry_safe(cmd, tmp, list, c_hlist) { (gdb) bt #0 0x000000000041c5b7 in abort_task_set (mreq=0x115eef00, target=0x103be510, itn_id=2478, tag=805306479, lun=0x0, all=0) at target.c:1155 #1 0x000000000041c7ee in target_mgmt_request (tid=21440, itn_id=2478, req_id=277390720, function=13, lun_buf=0x1088a588 "", tag=805306479, host_no=0) at target.c:1202 #2 0x00000000004085be in iscsi_tm_execute (task=0x1088a580) at iscsi/iscsid.c:1431 #3 0x0000000000408755 in iscsi_task_execute (task=0x1088a580) at iscsi/iscsid.c:1480 #4 0x0000000000408b04 in iscsi_task_queue (task=0x1088a580) at iscsi/iscsid.c:1557 #5 0x000000000040927b in iscsi_task_rx_done (conn=0x11910c88) at iscsi/iscsid.c:1698 #6 0x000000000040a323 in iscsi_rx_handler (conn=0x11910c88) at iscsi/iscsid.c:2114 #7 0x0000000000411ba6 in iscsi_tcp_event_handler (fd=428, events=1, data=0x11910c88) at iscsi/iscsi_tcp.c:158 #8 0x0000000000417365 in event_loop () at tgtd.c:454 #9 0x0000000000417a16 in main (argc=1, argv=0x7fffa114a938) at tgtd.c:640 (gdb) (gdb) p i $1 = 6 (gdb) print ARRAY_SIZE(itn->cmd_hash_list) No symbol "ARRAY_SIZE" in current context. (gdb) p cmd $2 = (struct scsi_cmd *) 0x5287f1ab8a18d390 (gdb) p list $3 = (struct list_head *) 0x10580ba0 (gdb) p tmp $4 = (struct scsi_cmd *) 0x5287f1ab8a18d390 (gdb) p cmd->dev Cannot access memory at address 0x5287f1ab8a18d3c0 (gdb) p list $5 = (struct list_head *) 0x10580ba0 (gdb) p itn $6 = (struct it_nexus *) 0x10580b30 == The system log can be downloaded from the following URL: http://dl.dropbox.com/u/8354750/tgtd/20110705/reproduce_02/messages.zip Thanks a lot. Kiefer Chang 2011/7/4 Kiefer Chang <zapchang@xxxxxxxxx>: > Dear Tomonori, > > We got segfault error on heavy I/O. Hope you can give some suggestion. > > [Setting] > 7 machines, each machine runs a VM and each VM uses 10 targets on > tgtd. Machine equips 1GB cards. > So there will be at least 70+ volumes on tgtd. > > The tgtd (1.0.16) is running on a machine with two 10GBe cards bonded. > For setting up backing store of target, LVM logical volumes are used. > (Physical volume is on software RAID 5) > > Both initiator side and target side are running CentOS 5.4. > > I tried to setting up the system so core-dump can be generated when > problem hit. The core dump file seems incomplete, file is 8G+ bigger, > but only use about 30~50M disk capacity. > > So I try to use gdb to attach to a debug build (make DEBUG=1) of tgtd. > (The symptom is much easier to be reproduced during heavy I/O test and > with optimized build of tgtd (-o2).) > When symptom shows, I got the following backtraces: (only the latest > part is pasted) > ============ > .. > [New Thread 0x2aabbaa5d940 (LWP 20176)] > [New Thread 0x2aabbb45e940 (LWP 20177)] > [New Thread 0x2aabbbe5f940 (LWP 20227)] > [New Thread 0x2aabbc860940 (LWP 20228)] > [New Thread 0x2aabbd261940 (LWP 20229)] > [New Thread 0x2aabbdc62940 (LWP 20230)] > [New Thread 0x2aabbe663940 (LWP 20258)] > [New Thread 0x2aabbf064940 (LWP 20259)] > [New Thread 0x2aabbfa65940 (LWP 20265)] > [New Thread 0x2aabc0466940 (LWP 20266)] > > Program received signal SIGSEGV, Segmentation fault. > 0x000000000040889d in iscsi_data_out_rx_start (conn=0x10f26028) at > iscsi/iscsid.c:1524 > 1524 if (task->tag == req->itt) > (gdb) bt > #0 0x000000000040889d in iscsi_data_out_rx_start (conn=0x10f26028) at > iscsi/iscsid.c:1524 > #1 0x0000000000409360 in iscsi_task_rx_start (conn=0x10f26028) at > iscsi/iscsid.c:1729 > #2 0x0000000000409d42 in iscsi_rx_handler (conn=0x10f26028) at > iscsi/iscsid.c:1986 > #3 0x0000000000411ba6 in iscsi_tcp_event_handler (fd=445, events=5, > data=0x10f26028) at iscsi/iscsi_tcp.c:158 > #4 0x0000000000417365 in event_loop () at tgtd.c:454 > #5 0x0000000000417a16 in main (argc=1, argv=0x7fffd5eb9a98) at tgtd.c:640 > (gdb) > > (gdb) print task > $5 = (struct iscsi_task *) 0xffffffffffffff90 > (gdb) print req > $6 = (struct iscsi_data *) 0x10f26148 > (gdb) > > (gdb) p task->req > Cannot access memory at address 0xffffffffffffff90 > (gdb) p task->rsp > Cannot access memory at address 0xffffffffffffffc0 > (gdb) p task->tag > Cannot access memory at address 0xfffffffffffffff0 > > > (gdb) p req->opcode > $30 = 5 '\005' > (gdb) p req->flags > $31 = 128 '\200' > (gdb) p req->rsvd2 > $32 = "\000" > > ============ > > The system log can be downloaded from here: > http://dl.dropbox.com/u/8354750/tgtd/20110704/messages > > Seems *task* is freed and referenced again. > Hope I can get some feedback. > Thanks a lot. > > -- > Kiefer Chang > -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html