On 04/05/2013 05:24 PM, James Smart wrote: > > On 4/4/2013 6:17 AM, Hannes Reinecke wrote: >> On 03/31/2013 07:44 PM, Tomas Henzl wrote: >>> What we can do is to decode the LUN and compare it to max_lun >>> provided by the driver, >>> I think that sg_luns is able to do that, so what is needed is >>> just to follow the SAM. >>> >>> I have seen reports of problem on three different drivers >>> connected to various >>> external storage, all of them having the same basic reason - the >>> driver sets a max_lun >>> and then LUN comes encoded with a newer addressing method and >>> something like this is shown >>> 'kernel: scsi: host 2 channel 0 id 2 lun16643 has a LUN larger >>> than allowed by the host adapter' >>> >>> Decoding the real LUN value would fix this problem, by decoding >>> is only meant the use in >>> scsi_report_lun_scan. The LUN would be stored exactly the same >>> way as it is now. >>> I know we can patch the certain drivers too, but when max_lun >>> were what the name says >>> - max LU number, it would fix my problem very easy. >>> >> Errm. >> >> No. Decoding LUNs is _evil_. It has only a relevance on the target, >> and even then it might choose to ignore it. >> So we cannot try to out-guess the target here. >> >> The error you're reporting is that lpfc is setting max_luns to >> '255', which of course is less than 16643. Increasing max_luns on >> lpfc to '0xFFFF' will fix your problem; nothing to do with 64-bit >> LUNs ... >> > > The reason lpfc set max_luns to 255 is due to the midlayer using > max_luns as a (SCSI-2 device) max sequential scan loop top value, > not necessarily as a max lun # as what's now in the report luns scan > loop. When we were attached to jbods (loop, etc) - we saw 2 > problems: our scan time dramatically increased (several minutes > based on a 16k max_lun value); and as the jbod only decoded 8 bits - > it happened to respond successfully to any lun value where the lower > 8-bits were 0, meaning lots of midlayer "ghost" devices were created > when in reality there was only 1 lun present. Changing the > max_luns value is fine as long as you know what's attached. > Well, these are actually _two_ issues; the one is for sequential scan scaling with max_luns, the other is for a JBOD behaving badly when addressing LUNs with more than 8 bits set. Yes, it is true that sequential scan scales linearly with max_luns, so scanning 16k LUNs _does_ take some time. We had the same issue when using older EMC Clariion or Symmetrix which announced themselves as SCSI-2 devices. This is why we introduced the BLIST_REPORTLUN2 flag ... However, this will only be an issue if you have 'sparse_lun' set. The first I would declare a non-issue, as sequential scanning should stop after the first invalid device. Unless 'sparse_lun' is set, but this must be set explicitly via blacklist flags. And using 'sparse_lun' is _always_ asking for trouble, especially on a known broken device ... Do you happen to know which make the JBOD was? I would rather advocate for adding another BLIST flag here instead of degrading the entire scsi host ... Thanks. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html