Re: [PATCH 1/1] scsi: Add EH Start Unit retry

Brian King <brking@xxxxxxxxxx> · Mon, 02 Apr 2007 13:57:56 -0500

Douglas Gilbert wrote:
> Brian King wrote:
>> Currently, the scsi error handler will issue a START_UNIT
>> command if the drive indicates it needs its motor started
>> and the allow_restart flag is set in the scsi_device. If,
>> after the scsi error handler invokes a host adapter reset
>> due to error recovery, a device is in a unit attention
>> state AND also needs a START_UNIT, that device will be placed
>> offline. The disk array devices on an ipr RAID adapter
>> will do exactly this when in a dual initiator configuration.
>> This patch adds a single retry to the EH initiated
>> START_UNIT.
> 
> I have no objection to this patch. Just seems a pity
> that SCSI devices go to the trouble of sending unit
> attentions while OSes just throw them away.

I agree. The reason the ipr adapter firmware added this UA
in this configuration was to support SCSI 1 reservations and
communicate to the host that any reservation previously
held to the disk array is now lost since the adapter was reset.

> Perhaps the scsi_device sysfs directory could have entries
> like:
>   last_ua_asc
>   last_ua_ascq
>   last_ua_timestamp
> where code could place the asc/ascq codes and a timestamp
> then continue doing a retry.
> Could we get a log entry, hotplug event?

If we did have a way to communicate UA's to userspace like this
it seems like it would allow usage of SCSI 1 reservations
in this config and make it easier for an out of band tool to
manage these reservations. I wonder if it would be cleaner if
UAs could simply be sent up as netlink events / uevents, so they
could contain all the information needed in one packet, rather
than having to read sysfs attributes to figure out what happened.

> Logical units may queue unit attentions (sam4r10.pdf
> section 5.8.7) so it is possible that one retry may
> not be enough. With my suggestion above, only the last
> one would persist for a reasonable time.

Yep. I've already ran into that with dual ported SAS devices.
While one retry is sufficient for the ipr disk array devices I am
trying to fix this for, I have no objection to increasing it.
Maybe its just a case of increasing it later if it ends up
being an issue.

Brian

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html