On 6/27/2014 11:40 AM, Arshad Hussain wrote: > On 6/27/2014 10:34 AM, Nicholas A. Bellinger wrote: >> On Thu, 2014-06-26 at 15:30 +0530, Arshad Hussain wrote: >>> On 6/26/2014 2:06 PM, Arshad Hussain wrote: >>>> Hi Nab, >>>> On 6/25/2014 6:21 PM, Arshad Hussain wrote: >>>>> On 6/20/2014 11:17 PM, Nicholas A. Bellinger wrote: >> <SNIP> >> >>>>>> Thanks for confirming. I think this is the correct fix, but would like >>>>>> to verify some more on my end. Please send me the test off-list. >>>> This is what we are trying to achieve. Read from socket1 and socket2 and >>>> then write to socket1 and socket2. I have detailed the flow below. The >>>> code is modified to read and write with different ITT. And this work >>>> just fine. It is only when we read & write on two different sockets with >>>> same ITT it fails. >>>> >>>> Please notice after READ we are receiving stat_sn and incrementing the >>>> counters. This indicates that the stat_sn is being acknowledge. IMHO, >>>> the reads and writes on multiple sockets with same ITT should work. >>>> While we can change the test case to make it pass, by reading and >>>> writing on different ITT. This however will shadow a potential bug in >>>> target. As always your valuable comments. >>>> >>>> READ10 CMD [socket1,ITT=A3A3A3A3] -> [T] >>>> /* Update the counters. At this point stat_sn is ack. */ >>>> next_cmdsn = recvpdu->exp_cmd_sn; >>>> next_expstatsn = (recvpdu->stat_sn) + 0x01; >>>> maxcmdsn = recvpdu->max_cmd_sn; >>>> >>>> READ10 CMD [socket2,ITT=A3A3A3A3] -> [T] >>>> /* Update counters. At this point stat_sn is ack.*/ >>>> next_cmdsn = recvpdu->exp_cmd_sn; >>>> next_expstatsn = (recvpdu->stat_sn) + 0x01; >>>> maxcmdsn = recvpdu->max_cmd_sn; >>>> >>>> WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T] >>>> R2T <- [T] >>>> DATA OUT [socket1,ITT=A5A5A5A5] -> [T] >>>> >>>> WRITE10 CMD [socket2,ITT=A5A5A5A5] -> [T] >>>> R2T <- [T] >>>> DATA OUT [socket1,ITT=A5A5A5A5] -> [T] >>>> >>>> Thanks! >>>> Arshad >>> Just tried another set of cases with multiple connections. >>> >>> // First Write to socket1 >>> WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T] >>> R2T <- [T] >>> DATA OUT [socket1,ITT=A5A5A5A5] -> [T] >>> >>> // Second write to socked2, with different ITT. This works. >>> WRITE10 CMD [socket2,ITT=A3A3A3A3] -> [T] >>> R2T <- [T] >>> DATA OUT [socket1,ITT=A3A3A3A3] -> [T] >>> >>> // Third write again to socket1. ITT reuse. Please Note the writes are >>> ack at this point. >>> WRITE10 CMD [socket1,ITT=A5A5A5A5] -> [T] >>> R2T <- [T] >>> >>> Observation: First and second write works. The third write fails (No >>> crash). Printing below the dmesg. >>> >>> [16066.334410] Command ITT: 0xa5a5a5a5 received DataOUT after last >>> DataOUT received, dumping payload. >>> >> Ok, I think your hitting a race where the reused ITT is being received >> before TX thread context can set ISTATE_SENT_STATUS in >> iscsit_response_queue(), effectively preventing the first ITT from being >> removed via iscsit_ack_from_expstatsn() from the per connection command >> list used by the subsequent Data-Out to locate the second ITT. >> >> Please try the following work-around to ignore descriptors during the >> Data-Out ITT lookup if they have already received all expected Data-Out, >> instead of explicitly dumping payload in iscsit_check_dataout_hdr(). > Yes your patch works. Also, I will be running the whole regression suite > tonight. Will let you know the results. > > Thanks, > Arshad The whole regression works fine with no oops or crash. Thanks Arshad >> diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c >> index fd90b28..73355f4 100644 >> --- a/drivers/target/iscsi/iscsi_target_util.c >> +++ b/drivers/target/iscsi/iscsi_target_util.c >> @@ -400,6 +400,8 @@ struct iscsi_cmd *iscsit_find_cmd_from_itt_or_dump( >> >> spin_lock_bh(&conn->cmd_lock); >> list_for_each_entry(cmd, &conn->conn_cmd_list, i_conn_node) { >> + if (cmd->cmd_flags & ICF_GOT_LAST_DATAOUT) >> + continue; >> if (cmd->init_task_tag == init_task_tag) { >> spin_unlock_bh(&conn->cmd_lock); >> return cmd; >> >>> PS: Can you help me where to put BUG()/panic() as I am not being able to >>> provide you with bt. I cannot also get vmcore on crash (previous 3.14 >>> without patch) since it is locking I am unable to enter into >>> dump-capture kernel. Overall, let me know what all more I can get you to >>> help you debug. >>> >> No need for the vmcore without the previous patch. It's clear that the >> first patch addresses the original OOPsen, and it will be included in >> the next PULL request. >> >> Thanks again, >> >> --nab >> >> -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html