Re: Debugging scsi abort handling ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/29/2014 06:39 AM, Finn Thain wrote:

On Thu, 28 Aug 2014, Hannes Reinecke wrote:

What might happen, though, that the command is already dead and gone by
the time you're calling ->scsi_done() (if you call it after eh_abort).
So there might not _be_ a command upon which you can call ->scsi_done()
to start with.

Hence any LLDD need to clear up any internal references after a call to
eh_XXX to ensure it doesn't call ->scsi_done() an in invalid command.

So even if the LLDD returns 'FAILED' upon a call to eh_XXX it _still_
needs to clear up the internal reference.

This is a question that has been bothering me too. If the host's
eh_abort_cmd() method returns FAILED, it seems the mid-layer is liable to
re-issue the same command to the LLD (?)

No.
FAILED for any eh_abort_cmd() means that the TMF hasn't been sent.
So the midlayer escalates to the next EH step.
The command will only ever be re-issued once EH completes.

Either that or return 'FAILED' for any later eh_XXX function until the
internal references can be cleared up.

So if a command may or may not "exist" after eh_abort_handler() returns
control to the mid-layer (regardless of SUCCESS or FAILURE), then the LLD
has to be careful about keeping track of which commands were aborted, if
those commands are still in the process of cleanup when eh_abort_handler()
returns.

Yes.

It's hard to see how that can work when command pointers are only unique
while a command "exists".

Which is why we have the EH callbacks, to give the LLDD a chance to clean up internal references.

In effect, this would mean that EH functions cannot return at all, until
the relevant command(s) are completely forgotten by the LLD; and that
means the LLD itself may have to escalate abort -> device reset -> bus
reset -> etc instead of simply returning FAILED.

More often than not the LLDD has its own internal command structure, which reference the midlayer SCSI command structure via a pointer.
Just clearing that pointer will do the trick.

Take eg. lpfc:
It'll construct its internal command here:

	lpfc_cmd = lpfc_get_scsi_buf(phba, ndlp);
	if (lpfc_cmd == NULL) {
		lpfc_rampdown_queue_depth(phba);

		lpfc_printf_vlog(vport, KERN_INFO, LOG_FCP,
				 "0707 driver's buffer pool is empty, "
				 "IO busied\n");
		goto out_host_busy;
	}

	/*
	 * Store the midlayer's command structure for the
	 * completion phase
	 * and complete the command initialization.
	 */
	lpfc_cmd->pCmd  = cmnd;
	lpfc_cmd->rdata = rdata;
	lpfc_cmd->timeout = 0;
	lpfc_cmd->start_time = jiffies;
	cmnd->host_scribble = (unsigned char *)lpfc_cmd;

and then checks for the pointer upon command completion:

static void
lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pIocbIn,
			struct lpfc_iocbq *pIocbOut)
{
	struct lpfc_scsi_buf *lpfc_cmd =
		(struct lpfc_scsi_buf *) pIocbIn->context1;

[ .. ]
	/* Sanity check on return of outstanding command */
	if (!(lpfc_cmd->pCmd))
		return;

But indeed, 'FAILED' is not very meaningful here, leaving the midlayer with no information about what happened to the command.

Personally I would like to enforce this meaning on the eh_XXX callbacks:
- upon each eh_XXX callback the LLDD clears any internal references
  to the command / command scope (ie eh_abort_cmd clears the
  references to the command, eh_lun_reset clears all internal
  references to commands to this ITL nexus etc.)
  This happens irrespective of the return code.
- The eh_XXX callback shall return 'FAILED' if the respective
  TMF (or equivalent) could not be initiated.
- The eh_XXX callback shall return 'SUCCESS' if the respective
  TMF (or equvalent) could be initiated.
- After each eh_XXX callback control for this command / command
  scope is transferred back to the midlayer; the LLDD shall not
  assume the associated command structures to remain valid after
  that point.

I'm tempted to enshrine this in the documentation;
that surely will help me during the EH cleanup.
And Hans will have some guidelines on how to design uas EH :-)

Cheers,

Hannes
--
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux