Re: [PATCH 14/14] uas: fix race with TMF performing an abort and regular finish

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 17-02-16 16:50, Oliver Neukum wrote:
A successful TMF means that the URBs for a command must be killed.
Also fixes sleeping with a spinlock.

Signed-off-by: Oliver Neukum <oneukum@xxxxxxxx>
---
  drivers/usb/storage/uas.c | 35 ++++++++++++++++-------------------
  1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index d789227..5eaa270 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -57,7 +57,7 @@ struct uas_dev_info {
  	spinlock_t lock;
  	struct work_struct work;
  	struct completion deathknell;
-	bool tmf_perfomed; /* an ack to a TMF was seen */
+	bool tmf_performed; /* an ack to a TMF was seen */
  	bool tmf_catastrophic; /* TMF failed, retry useless */
  	bool tmf_error; /* tmf may be retried after error handling */
  };
@@ -271,9 +271,9 @@ static void finish_tmf(struct uas_dev_info *devinfo, struct response_iu *riu, st
  	uas_log_cmd_state(cmnd, "expected response iu", response_code);

  	if (response_code == RC_TMF_SUCCEEDED)
-		devinfo->tmf_perfomed = true;
+		devinfo->tmf_performed = true;
  	else
-		devinfo->tmf_perfomed = false;
+		devinfo->tmf_performed = false;
  }

  static bool uas_evaluate_response_iu(struct response_iu *riu, struct scsi_cmnd *cmnd)
@@ -837,12 +837,9 @@ static int uas_eh_abort_handler(struct scsi_cmnd *cmnd)
  	struct task_mgmt_iu *tmf = devinfo->tmf_iu;
  	struct urb *data_in_urb = NULL;
  	struct urb *data_out_urb = NULL;
-	unsigned long flags;
-	int err, time;
+	int err, time = 0;
  	int success = FAILED;

-	spin_lock_irqsave(&devinfo->lock, flags);
-
  	uas_log_cmd_state(cmnd, __func__, 0);

  	/* Ensure that try_complete does not call scsi_done */
@@ -852,6 +849,7 @@ static int uas_eh_abort_handler(struct scsi_cmnd *cmnd)
  	devinfo->deathrow = cmnd;
  	devinfo->tmf_catastrophic = false;
  	devinfo->tmf_error = false;
+	devinfo->tmf_performed = false;

  	prepare_tmf_urb(devinfo, cmnd);
  	prepare_ABORT_TASK_tmf(tmf, cmdinfo);
@@ -869,14 +867,7 @@ static int uas_eh_abort_handler(struct scsi_cmnd *cmnd)
  	time = wait_for_completion_timeout(&devinfo->deathknell, USB_CTRL_GET_TIMEOUT);
  	/* in case of timeout */
  	usb_kill_urb(devinfo->management_urb);
-	if (time && devinfo->tmf_perfomed) {
-		cmdinfo->state &= ~COMMAND_ABORTING;
-		/*
-		 * manually finish as resources must be freed only once
-		 */
-		cmnd->result = DID_ABORT << 16;
-		cmnd->scsi_done(cmnd);
-	}
+
  	if (devinfo->tmf_catastrophic)
  		success = FAST_IO_FAIL;
  	else if (devinfo->tmf_error)
@@ -893,8 +884,6 @@ give_up:

  	uas_free_unsubmitted_urbs(cmnd);

-	spin_unlock_irqrestore(&devinfo->lock, flags);
-
  	if (data_in_urb) {
  		usb_kill_urb(data_in_urb);
  		usb_put_urb(data_in_urb);
@@ -903,7 +892,15 @@ give_up:
  		usb_kill_urb(data_out_urb);
  		usb_put_urb(data_out_urb);
  	}
-
+	/* must be done after the URBs are sure to be dead */
+	if (time && devinfo->tmf_performed) {
+		cmdinfo->state &= ~COMMAND_ABORTING;
+		/*
+		 * manually finish as resources must be freed only once
+		 */
+		cmnd->result = DID_ABORT << 16;
+		cmnd->scsi_done(cmnd);
+	}
  	return success;

I believe we should not return success when timedout or when
devinfo->tmf_performed is not set.

  }

Your commit msg mentions a race between TMF abort and regular finish,
but what if the data urbs regular finish races with:

        if (cmdinfo->state & DATA_IN_URB_INFLIGHT)
                data_in_urb = usb_get_urb(cmdinfo->data_in_urb);
        if (cmdinfo->state & DATA_OUT_URB_INFLIGHT)
                data_out_urb = usb_get_urb(cmdinfo->data_out_urb);

So you call usb_get_urb(cmdinfo->data_out_urb) which races
with the usb_free_urb() in uas_data_cmplt() ?

Also I believe that you are trusting the hardware way too much
here, malicious (or buggy) hardware could wait for the tmf abort, then finish
normally and report the tmf abort as successful leading to a double scsi_done().

I believe you need to do something like this on abort from the scsi core:

1) take devinfo->lock
2) check if the cmd has not completed normally before you managed to
   get the lock (I don't know how to do this, maybe the core has a way
   todo this, if not we need to use cmdinfo->state)
   If it has completed normally before we had the lock, drop the lock
   return success
3) set a flag in cmdinfo->state to make all the urb completion handlers
   not touch the cmd anymore (nor free related urbs), and modify all the
   urb completion handlers and uas_do_work accordingly (the current upstream
   driver already has some code for this, but since you now drop the lock
   halfway through you need to double check all this)
4) drop devinfo->lock
5) do the abort including sleeping for it to complete, etc.
6) on failure return failure
7) Take devinfo->lock again to avoid racing with uas_data_cmplt
8) get, kill and put data urbs if they are still in flight
9) complete the command using uas_try_complete(), so that urbs which never
   got submitted get freed.
10) What about the sense urb for the cmd, possibly this did not complete ??
   with usb-3 we can just store it in cmdinfo and kill it as it is linked to
   the tag / bulk-stream for the cmd being cancelled).

   But with usb-2 it is not linked to the tag, and we assume that sense urbs
   may not complete when aborting a command then we end up leaking a sense urb
   since we do not know which sense urb belongs to which in flight command
   (as they are not coupled to tags, we simply submit as many sense urbs
    as commands) so if we assume we may end up with an un-completed sense
   urb after an abort, then we need to change how we deal with sense-urbs
   when operating in usb-2 mode and instead of submitting one per cmd,
   have a pool of them, which we (re-)submit on cmd submission / completion
   up to a max of say 3 (having more in flight makes little sense), but never
   more then we've cmds in flight, as I do not know what happens when submitting
   sense urbs when the hardware has no cmds to process and I would rather not
   find out.
11) drop devinfo->lock
12) return success

Note that similar worries apply to the uas_eh_device_reset_handler() code which
I've not yet taken a close look at. I believe it would be best to first
figure out abort, and then we can add device_reset support later.

Regards,

Hans



@@ -927,7 +924,7 @@ static int uas_eh_device_reset_handler(struct scsi_cmnd *cmnd)
  		goto failure;

  	time = wait_for_completion_timeout(&devinfo->deathknell, USB_CTRL_GET_TIMEOUT);
-	if (time && devinfo->tmf_perfomed)
+	if (time && devinfo->tmf_performed)
  		success = SUCCESS;
  	usb_kill_urb(devinfo->management_urb);


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux