Re: [PATCH 0/4] scsi: 64-bit LUN support

Tomas Henzl <thenzl@xxxxxxxxxx> · Mon, 08 Apr 2013 17:37:10 +0200

On 04/05/2013 05:24 PM, James Smart wrote:
>
> On 4/4/2013 6:17 AM, Hannes Reinecke wrote:
>> On 03/31/2013 07:44 PM, Tomas Henzl wrote:
>>> What we can do is to decode the LUN and compare it to max_lun provided by the driver,
>>> I think that sg_luns is able to do that, so what is needed is just to follow the SAM.
>>>
>>> I have seen reports of problem on three different drivers connected to various
>>> external storage, all of them having the same basic reason - the driver sets a max_lun
>>> and then LUN comes encoded with a newer addressing method and something like this is shown
>>> 'kernel: scsi: host 2 channel 0 id 2 lun16643 has a LUN larger than allowed by the host adapter'
>>>
>>> Decoding the real LUN value would fix this problem, by decoding is only meant the use in
>>> scsi_report_lun_scan. The LUN would be stored exactly the same way as it is now.
>>> I know we can patch the certain drivers too, but when max_lun were  what the name says
>>> - max LU number, it would fix my problem very easy.
>>>
>> Errm.
>>
>> No. Decoding LUNs is _evil_. It has only a relevance on the target,
>> and even then it might choose to ignore it.
>> So we cannot try to out-guess the target here.
OK, I can see the problems with decoding the LUN one of them is the need to
again encode the LUN to address format + number. I'm not sure if the hw
would work if another address mode were used.

When we understand the LUN as a complex structure then it makes no sense
to compare to max_lun as a number - http://lxr.linux.no/#linux+v3.8.6/drivers/scsi/scsi_scan.c#L1471

>> The error you're reporting is that lpfc is setting max_luns to
>> '255', which of course is less than 16643. Increasing max_luns on
>> lpfc to '0xFFFF' will fix your problem; nothing to do with 64-bit
>> LUNs ...
I think I haven't mentioned lpfc, but it doesn't matter.
Fixing this in individual drivers by increasing the max_lun is not easy,
because the firmware could have some reasons for the max lun (some tables, ..., 
fact is I have no idea how this is implemented in the hw).
If the fix for this were just to set max_lun to 0xFFFF in every driver
it means that we could remove the max_lun and the test completely. 

A kernel option like 'ignore_max_lun' would help, but I somehow dislike it,
what do you think?

> The reason lpfc set max_luns to 255 is due to the midlayer using 
> max_luns as a (SCSI-2 device) max sequential scan loop top value, not 
> necessarily as a max lun # as what's now in the report luns scan loop. 
> When we were attached to jbods (loop, etc) - we saw 2 problems: our scan 
> time dramatically increased (several minutes based on a 16k max_lun 
> value); and as the jbod only decoded 8 bits - it happened to respond 
> successfully to any lun value where the lower 8-bits were 0, meaning 
> lots of midlayer "ghost" devices were created when in reality there was 
> only 1 lun present.    Changing the max_luns value is fine as long as 
> you know what's attached.
>
> -- james s
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html