On Tue, 2017-02-07 at 11:59 +0100, David Disseldorp wrote: > On Mon, 6 Feb 2017 19:35:47 +0000, Bart Van Assche wrote: > > > On Mon, 2017-02-06 at 17:44 +0100, David Disseldorp wrote: > > > FWIW, this was configured using the script at: > > > https://github.com/ddiss/rapido/blob/master/lio_local_autorun.sh > > > > Hello David, > > > > Thanks for having provided that script, that's very helpful. I ran that script > > after I had entered the following: > > > > _fatal() { > > exit 1 > > } > > > > DYN_DEBUG_MODULES= > > DYN_DEBUG_FILES= > > INITIATOR_IQNS=" > > iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test > > iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test-2 > > " > > TARGET_IQN=tgt1 > > IP_ADDR1=$(ip addr show dev eth0 | sed -n 's,^[[:blank:]]*inet \([^/]*\)/.*$,\1,p') > > MAC_ADDR1= > > IP_ADDR2= > > MAC_ADDR2=foobar > > > > Next, I ran the two libiscsi tests mentioned earlier: > > > > for ((i=0;i<100;i++)); do > > for t in ALL.iSCSITMF.LUNResetSimpleAsync ALL.MultipathIO.Reset; do > > iscsi-test-cu --dataloss --allow-sanitize -t $t iscsi://$IP_ADDR1/tgt1/0 iscsi://$IP_ADDR1/tgt1/0 > > done > > done > > > > That loop completed in about five seconds. Sorry but that means that I am still > > unable to reproduce the missing TMF reply that you have reported. > > Aha - If I run the test against a fileio backed LU then it passes, it > fails against either of the iblock backed LUs. That is because all FILEIO backend I/O is synchronous, so no se_cmd descriptors are ever hitting CMD_T_ABORTED for ABORT_TASK or LUN_RESET in your test. ;) > Perhaps this race is > dependent on the I/O making it to the backstore/block layer by the time > the LU RESET request comes in? In the past I hit a bug similar to this > (in the ABORT TASK path), and used the dm-delay device (setup by the > script) to trip the race. > > Do you see the failure when testing against LUN1 or LUN2? The fatal flaw with patch #19 is the new se_cmd->finished completion introduced to handle all CMD_T_ABORTED cases can never make forward progress in any case, because CMD_T_ABORTED logic takes it's own se_cmd->cmd_kref in __target_check_io_state(), and then blocks on wait_for_completion_timeout(&se_cmd->finished). In order to complete se_cmd->finished, se_cmd->cmd_kref must reach zero to call target_release_cmd_kref() -> complete_all(&se_cmd->finished), but since the tmr kthread caller who is blocked on se_cmd->finished holds the final se_cmd->cmd_kref reference, it's fatal for the simple first order scenario every time. Patch #19 + #20 breaks the second order issue where CMD_T_ABORTED happens concurrently with se_session shutdown CMD_T_FABRIC_STOP too. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html