hello! we found tgtd happen core dump and fix it。 1: we use the stgt code git clone https://github.com/fujita/tgt.git stgt git commit version: 44c11763d71dc019741c84a857080cd0b4a2f265 2: our test cases export 10 sata disk with stgt,each sata disk capacity is 100GB, each target have one lun, in iscsi initiator import, using FIO to to parallel random write test for 10 devices, during this period the frequent disconnection of iscsi networks。 3: tgtd crash, happen core dump test about 5 hours,tgtd process crash, happen core dump, the stack trace showed below: [New LWP 121362] [New LWP 121349] [New LWP 121350] [New LWP 121370] [New LWP 121404] [New LWP 121365] [New LWP 121348] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `tgtd -d 1'. Program terminated with signal 11, Segmentation fault. #0 0x000000000040a259 in __list_del (prev=0x7cd38c0, next=0x5aff340) at ./list.h:83 83 prev->next = next; Missing separate debuginfos, use: debuginfo-install libgcc-4.8.3-9.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 (gdb) bt #0 0x000000000040a259 in __list_del (prev=0x7cd38c0, next=0x5aff340) at ./list.h:83 #1 0x000000000040a296 in list_del (entry=0x623d080) at ./list.h:88 #2 0x000000000040edd6 in iscsi_free_cmd_task (task=0x623d010) at iscsi/iscsid.c:1254 #3 0x000000000040ee6a in iscsi_scsi_cmd_done (nid=3356, result=0, scmd=0x623d0e0) at iscsi/iscsid.c:1269 #4 0x0000000000432dc0 in target_cmd_io_done (cmd=0x623d0e0, result=0) at target.c:1236 #5 0x000000000045be39 in bs_sig_request_done (fd=10, events=1, data=0x0) at bs.c:210 #6 0x0000000000428ff2 in event_loop () at tgtd.c:432 #7 0x0000000000429fca in main (argc=3, argv=0x7ffffcec25f8) at tgtd.c:624 4: Our analysis of tgtd crash beause the IO during the frequent connection/disconnection of iscsi networks,this will trigger close current iscsi session, lead to stgt frequent call conn_close function, conn_close will clear all iscsi task,we found some ISCSI_OP_SCSI_CMD type's task, no list_del from session->cmd_list before free it. 5: Our patch for tgtd crash >From 2abf9229ea206172f2fe244539f7f87ed404eff8 Mon Sep 17 00:00:00 2001 From: Chen Fangxian <chenfangxian@xxxxxxxxxxxxxxxxxxxx> Date: Fri, 22 Jul 2016 10:37:23 +0800 Subject: [PATCH] iscsi: fix segfault at conn_close Remove some iscsi task from conn->session->cmd_list before free it, otherwise it may cause tgtd process crash. Below is a backtrace info: Program terminated with signal 11, Segmentation fault. #0 0x000000000040a259 in __list_del (prev=0x7cd38c0, next=0x5aff340) at ./list.h:83 83 prev->next = next; (gdb) bt #0 0x000000000040a259 in __list_del (prev=0x7cd38c0, next=0x5aff340) at ./list.h:83 #1 0x000000000040a296 in list_del (entry=0x623d080) at ./list.h:88 #2 0x000000000040edd6 in iscsi_free_cmd_task (task=0x623d010) at iscsi/iscsid.c:1254 #3 0x000000000040ee6a in iscsi_scsi_cmd_done (nid=3356, result=0, scmd=0x623d0e0) at iscsi/iscsid.c:1269 #4 0x0000000000432dc0 in target_cmd_io_done (cmd=0x623d0e0, result=0) at target.c:1236 #5 0x000000000045be39 in bs_sig_request_done (fd=10, events=1, data=0x0) at bs.c:210 #6 0x0000000000428ff2 in event_loop () at tgtd.c:432 #7 0x0000000000429fca in main (argc=3, argv=0x7ffffcec25f8) at tgtd.c:624 Signed-off-by: Meng Lingkun <menglingkun@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Wang Zhengyong <wangzhengyong@xxxxxxxxxxxxxxxxxxxx> Reviewed-by: Wang Dongxu <wangdongxu@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Chen Fangxian <chenfangxian@xxxxxxxxxxxxxxxxxxxx> --- usr/iscsi/conn.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/usr/iscsi/conn.c b/usr/iscsi/conn.c index e7d4e8c..3ed08a8 100644 --- a/usr/iscsi/conn.c +++ b/usr/iscsi/conn.c @@ -83,6 +83,22 @@ void conn_exit(struct iscsi_connection *conn) session_put(session); } +static int find_task_in_cmd_list(struct iscsi_connection *conn, + struct iscsi_task *find_task) +{ + struct iscsi_task *task = NULL; + int find = 0; + + list_for_each_entry(task, &conn->session->cmd_list, c_hlist) { + if (task == find_task) { + find = 1; + break; + } + } + + return find; +} + void conn_close(struct iscsi_connection *conn) { struct iscsi_task *task, *tmp; @@ -134,6 +150,7 @@ void conn_close(struct iscsi_connection *conn) list_for_each_entry_safe(task, tmp, &conn->tx_clist, c_list) { uint8_t op; + list_del(&task->c_list); op = task->req.opcode & ISCSI_OPCODE_MASK; eprintf("Forcing release of tx task %p %" PRIx64 " %x\n", @@ -146,10 +163,14 @@ void conn_close(struct iscsi_connection *conn) * would be a better way to see * task->scmd.c_target though. */ - if (task->scmd.c_target) + if (task->scmd.c_target) { iscsi_free_cmd_task(task); - else + } else { + if (find_task_in_cmd_list(conn, task)) + list_del(&task->c_hlist); + iscsi_free_task(task); + } break; case ISCSI_OP_NOOP_IN: /* NOOP_IN req is allocated within iscsi_tcp @@ -181,6 +202,9 @@ void conn_close(struct iscsi_connection *conn) if (conn->rx_task) { eprintf("Forcing release of rx task %p %" PRIx64 "\n", conn->rx_task, conn->rx_task->tag); + if (find_task_in_cmd_list(conn, conn->rx_task)) + list_del(&conn->rx_task->c_hlist); + iscsi_free_task(conn->rx_task); } conn->rx_task = NULL; @@ -193,6 +217,10 @@ void conn_close(struct iscsi_connection *conn) */ if (task_in_scsi(task)) continue; + + if (find_task_in_cmd_list(conn, task)) + list_del(&task->c_hlist); + iscsi_free_task(task); } done: -- 1.8.3.1
Attachment:
0001-iscsi-fix-segfault-at-conn_close.patch
Description: Binary data