Re: [PATCH 1/2] scsi_scan: Send TEST UNIT READY to the LUN before scanning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/11/2014 04:46 PM, James Bottomley wrote:
On Wed, 2014-06-11 at 16:33 +0200, Hannes Reinecke wrote:
On 06/11/2014 04:24 PM, James Bottomley wrote:
On Thu, 2014-06-05 at 09:26 +0200, Hannes Reinecke wrote:
REPORT_LUN_SCAN does not report any outstanding unit attention
condition as per SAM. However, the target might not be fully
initialized at that time, so we might end up getting a
default entry (or even a partially filled one).
But as we're not able to process the REPORT LUN DATA HAS CHANGED
unit attention correctly we'll be missing out some LUNs during
startup.
So it's better to send a TEST UNIT READY for modern implementations
and wait until the unit attention condition goes away.

Are you sure this is a good idea: we just spent ages tuning SCSI init so
we don't slow systems down.  This patch, in the event the array is
having a power on problem, takes us right back to waiting for init
again ... basically the busy wait in scsi_test_lun.

Since the array should send us a UA anyway when it's got itself sorted
out, what's wrong with just processing the report luns data has changed
condition?

Because we can't.

_If_ we were attempting this we'd run into several issues:
a) Boot will fail, as REPORT LUNs will return 0 LUNs (or just LUN 0).
     So the scanning code will assume everything's fine. Booting will
     continue, only to figure out that no LUNs are present.
     As there is _no_ indication that REPORT LUNs should indeed have
     returned an error (only it can't due to SAM) we wouldn't even
     now that there _is_ an issue.
     (In fact, that's what triggered the patchset in the first place.)
b) Even _if_ we're able so somehow recover from that we will have
     to rescan the host and any attached devices.
     The only way to do this currently is to _remove_ all devices
     from that host and then do a full rescan.
     Trying this with any devices which are already part of some
     complex setup will become ... interesting.

OK, go back to first principles and tell us what the actual problem is,
with traces and details.  Is this some weird SCSI-3 device with a single
LUN that's screwing up report luns ... in which case we can just
blacklist it.  Or is it boot from an array?

The problem is as follows:

> Right after the "inquiry" the scsi subsystem sends a "report luns"
> to the RAID array.
> The RAID answers the "report luns" with only the 8 byte header
> and an empty (i.e. not existing) LUN list after this header
> because the LUNs still execute their initialization phase and
> did not reach their ready state yet.
> The RAID manufacturer describes this behaviour as an indication
> for: "there are no LUNs available".
>
> Then immediately follows a "test unit ready" command from the
> scsi subsystem to LUN 0  which is answered by the RAID firmware
> with a "check condition"  "not ready, initialisation in progress".
>
As per SPC 'REPORT LUN' cannot return any check condition.
So we cannot distinguish by evaluating the 'REPORT LUN' response
whether it refers to a valid response or not.

Hence my approach to send a TEST UNIT READY prior to REPORT LUN,
as this would return any outstanding unit attention codes and
we can wait until the initialisation is finished.
Plus we're sending a TEST UNIT READY anyway when we're scanning
the LUN from sd.c:spin_up_disk(), so in effect we're just
moving the call.

So the easy way out here is indeed just to send a TEST UNIT READY.
And as we're checking for a reasonably SCSI compliance we should
be catching most of the oddballs.

I don't object hugely to TUR ... except it binds us to spin up because
most devices will respond not ready.  I do object to busy waiting in the
init thread until we get the right answer.

The problem is indeed in SPC:

The REPORT LUNS parameter data should be returned even though the device server is not ready for other commands. The report of the logical unit inventory should be available without incurring any media access delays. If the device server is not ready with the logical unit inventory or if the inventory list is null for the requesting I_T nexus and the SELECT REPORT field set to 02h, then the device server shall provide a default logical unit inventory that contains at least LUN 0 or the REPORT LUNS well known logical unit (see 8.2). A non-empty peripheral device logical unit inventory that does not contain either LUN 0 or the REPORT LUNS
well known logical unit is valid.

So the above array is perfectly within spec.

Cheers,

Hannes
--
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux