Re: [PATCH 0/4] scsi: 64-bit LUN support

Hannes Reinecke <hare@xxxxxxx> · Mon, 08 Apr 2013 16:06:41 +0200

On 04/05/2013 05:24 PM, James Smart wrote:
> 
> On 4/4/2013 6:17 AM, Hannes Reinecke wrote:
>> On 03/31/2013 07:44 PM, Tomas Henzl wrote:
>>> What we can do is to decode the LUN and compare it to max_lun
>>> provided by the driver,
>>> I think that sg_luns is able to do that, so what is needed is
>>> just to follow the SAM.
>>>
>>> I have seen reports of problem on three different drivers
>>> connected to various
>>> external storage, all of them having the same basic reason - the
>>> driver sets a max_lun
>>> and then LUN comes encoded with a newer addressing method and
>>> something like this is shown
>>> 'kernel: scsi: host 2 channel 0 id 2 lun16643 has a LUN larger
>>> than allowed by the host adapter'
>>>
>>> Decoding the real LUN value would fix this problem, by decoding
>>> is only meant the use in
>>> scsi_report_lun_scan. The LUN would be stored exactly the same
>>> way as it is now.
>>> I know we can patch the certain drivers too, but when max_lun
>>> were  what the name says
>>> - max LU number, it would fix my problem very easy.
>>>
>> Errm.
>>
>> No. Decoding LUNs is _evil_. It has only a relevance on the target,
>> and even then it might choose to ignore it.
>> So we cannot try to out-guess the target here.
>>
>> The error you're reporting is that lpfc is setting max_luns to
>> '255', which of course is less than 16643. Increasing max_luns on
>> lpfc to '0xFFFF' will fix your problem; nothing to do with 64-bit
>> LUNs ...
>>
> 
> The reason lpfc set max_luns to 255 is due to the midlayer using
> max_luns as a (SCSI-2 device) max sequential scan loop top value,
> not necessarily as a max lun # as what's now in the report luns scan
> loop. When we were attached to jbods (loop, etc) - we saw 2
> problems: our scan time dramatically increased (several minutes
> based on a 16k max_lun value); and as the jbod only decoded 8 bits -
> it happened to respond successfully to any lun value where the lower
> 8-bits were 0, meaning lots of midlayer "ghost" devices were created
> when in reality there was only 1 lun present.    Changing the
> max_luns value is fine as long as you know what's attached.
> 
Well, these are actually _two_ issues; the one is for sequential
scan scaling with max_luns, the other is for a JBOD behaving badly
when addressing LUNs with more than 8 bits set.

Yes, it is true that sequential scan scales linearly with max_luns,
so scanning 16k LUNs _does_ take some time.
We had the same issue when using older EMC Clariion or Symmetrix
which announced themselves as SCSI-2 devices.
This is why we introduced the BLIST_REPORTLUN2 flag ...

However, this will only be an issue if you have 'sparse_lun' set.

The first I would declare a non-issue, as sequential scanning should
stop after the first invalid device.
Unless 'sparse_lun' is set, but this must be set explicitly via
blacklist flags.
And using 'sparse_lun' is _always_ asking for trouble, especially
on a known broken device ...

Do you happen to know which make the JBOD was?
I would rather advocate for adding another BLIST flag here instead
of degrading the entire scsi host ...

Thanks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html