Re: [PATCH] libata/sas: only set FROZEN flag if new EH is supported

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2011-06-23 at 14:31 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2011-06-21 at 15:30 -0500, Brian King wrote:
> > Looks good to me. Jeff/Tejun - any issues with merging this?
> 
> BTW. Current upstream with that patch applied on a machine here leads to
> several oddities, I don't know at this point whether any of that is
> actually a regression :

Ooops... pressed "send" too quickly. Here's a log excerpt with some
comments:

ipr: IBM Power RAID SCSI Device Driver version: 2.5.2 (April 27, 2011)
ipr 0000:04:00.0: Found IOA with IRQ: 129
ipr 0000:04:00.0: Initializing IOA.
ipr 0000:04:00.0: Starting IOA initialization sequence.
ipr 0000:04:00.0: Adapter firmware version: 04220029
ipr 0000:04:00.0: IOA initialized.
scsi0 : IBM 2B4C Storage Adapter
scsi 0:0:4:0: Direct-Access     IBM      ST9300603SS      BB09 PQ: 0 ANSI: 6
scsi scan: INQUIRY result too short (5), using 36

-> Are these odd INQUIRY results expected ?

scsi 0:0:5:0: Direct-Access                                    PQ: 0 ANSI: 0
scsi 0:0:6:0: Direct-Access     IBM      ST9300603SS      BB09 PQ: 0 ANSI: 6
scsi 0:0:7:0: Direct-Access     IBM      ST9300603SS      BB09 PQ: 0 ANSI: 6
scsi scan: INQUIRY result too short (5), using 36
scsi 0:0:18:0: Direct-Access                                    PQ: 0 ANSI: 0
scsi 0:2:18:0: Enclosure         IBM      PSBPD6E4A  3GSAS 0109 PQ: 0 ANSI: 4
scsi: unknown device type 31
scsi 0:255:255:255: No Device         IBM      2B4C001SISIOA    0150 PQ: 0 ANSI: 0

-> The above looks odd, not sure what it means

ipr 0000:05:00.0: Found IOA with IRQ: 130
ipr 0000:05:00.0: Initializing IOA.
scsi 0:254:0:0: Processor         IBM      57CB001SISIOA    0150 PQ: 0 ANSI: 0
ipr 0000:05:00.0: Starting IOA initialization sequence.
ipr 0000:05:00.0: Adapter firmware version: 04220029
ipr 0000:05:00.0: IOA initialized.
scsi1 : IBM 57CB Storage Adapter
scsi scan: INQUIRY result too short (5), using 36
scsi 1:0:4:0: Direct-Access                                    PQ: 0 ANSI: 0
scsi 1:0:5:0: Direct-Access     IBM      ST9300603SS      BB09 PQ: 0 ANSI: 6
scsi scan: INQUIRY result too short (5), using 36
scsi 1:0:6:0: Direct-Access                                    PQ: 0 ANSI: 0
scsi 1:0:7:0: Direct-Access     IBM      ST9300603SS      BB09 PQ: 0 ANSI: 6
scsi 1:0:18:0: Enclosure         IBM      PSBPD6E4A  3GSAS 0109 PQ: 0 ANSI: 4
scsi: On host 1 channel 0 id 18 only 511 (max_scsi_report_luns) of 402653184 luns reported, try increasing max_scsi_report_luns.
scsi: host 1 channel 0 id 18 lun 0xc0000007f01e810f has a LUN larger than currently supported.

-> Now that looks horribly wrong... that LUN number looks like a kernel pointer

scsi scan: INQUIRY result too short (5), using 36
scsi 1:2:18:0: Direct-Access                                    PQ: 0 ANSI: 0
ata1.00: ATAPI: IBM     RMBO0040532, SA61, max UDMA/100
ata1.00: failed to IDENTIFY (device reports invalid type, err_mask=0x0)
ata1.00: revalidation failed (errno=-22)
ata1.00: disabled

-> So SATA works "better" with the patch but doesn't actually work properly :-)

scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

-> That error could give us more info... not sure what it means, we do have plenty of 
memory...
 
scsi 1:8:0:0: Enclosure         IBM      VSBPD6E4B  3GSAS   01 PQ: 0 ANSI: 2
scsi: unknown device type 31
scsi 1:255:255:255: No Device         IBM      57CB001SISIOA    0150 PQ: 0 ANSI: 0
work_for_cpu used greatest stack depth: 9520 bytes left
st: Version 20101219, fixed bufsize 32768, s/g segs 256
sd 0:0:4:0: [sda] 585937500 512-byte logical blocks: (300 GB/279 GiB)
sd 0:0:5:0: [sdb] 585937500 512-byte logical blocks: (300 GB/279 GiB)
sd 0:0:6:0: [sdc] 585937500 512-byte logical blocks: (300 GB/279 GiB)
sd 0:0:7:0: [sdd] 585937500 512-byte logical blocks: (300 GB/279 GiB)
sd 0:0:18:0: [sde] READ CAPACITY failed
sd 0:0:18:0: [sde]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:18:0: [sde]  Sense Key : Illegal Request [current] 
sd 0:0:18:0: [sde]  Add. Sense: Invalid command operation code

-> Any idea what's up with that guy ?

sd 0:0:4:0: Attached scsi generic sg0 type 0
sd 0:0:5:0: [sdb] Write Protect is off
sd 0:0:5:0: Attached scsi generic sg1 type 0
sd 0:0:18:0: [sde] Test WP failed, assume Write Enabled
sd 0:0:6:0: Attached scsi generic sg2 type 0
sd 0:0:18:0: [sde] Asking for cache data failed
sd 0:0:7:0: Attached scsi generic sg3 type 0
sd 0:0:18:0: [sde] Assuming drive cache: write through
sd 1:2:18:0: [sdj] READ CAPACITY failed
sd 1:2:18:0: [sdj]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 1:2:18:0: [sdj]  Sense Key : Illegal Request [current] 
sd 1:2:18:0: [sdj]  Add. Sense: Invalid command operation code

-> And this one ?

There's more below ...

sd 0:0:18:0: Attached scsi generic sg4 type 0
sd 1:0:4:0: [sdf] 585937500 512-byte logical blocks: (300 GB/279 GiB)
scsi 0:2:18:0: Attached scsi generic sg5 type 13
sd 1:0:6:0: [sdh] 585937500 512-byte logical blocks: (300 GB/279 GiB)
scsi 0:255:255:255: Attached scsi generic sg6 type 31
scsi 0:254:0:0: Attached scsi generic sg7 type 3
sd 1:0:4:0: Attached scsi generic sg8 type 0
sd 1:2:18:0: [sdj] Test WP failed, assume Write Enabled
sd 1:0:5:0: Attached scsi generic sg9 type 0
sd 1:0:6:0: Attached scsi generic sg10 type 0
sd 1:0:7:0: [sdi] 585937500 512-byte logical blocks: (300 GB/279 GiB)
sd 1:2:18:0: [sdj] Asking for cache data failed
sd 1:2:18:0: [sdj] Assuming drive cache: write through
sd 1:0:7:0: Attached scsi generic sg11 type 0
scsi 1:0:18:0: Attached scsi generic sg12 type 13
sd 0:0:4:0: [sda] Write Protect is off
sd 1:2:18:0: Attached scsi generic sg13 type 0
sd 0:0:18:0: [sde] READ CAPACITY failed
sd 0:0:18:0: [sde]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:18:0: [sde]  Sense Key : Illegal Request [current] 
sd 0:0:18:0: [sde]  Add. Sense: Invalid command operation code
scsi 1:8:0:0: Attached scsi generic sg14 type 13
scsi 1:255:255:255: Attached scsi generic sg15 type 31
e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
e1000: Copyright (c) 1999-2006 Intel Corporation.
e1000e: Intel(R) PRO/1000 Network Driver - 1.3.10-k2
e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
sd 1:0:6:0: [sdh] Write Protect is off
sd 0:0:5:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 0:0:18:0: [sde] Test WP failed, assume Write Enabled
sd 0:0:7:0: [sdd] Write Protect is off
sd 1:2:18:0: [sdj] READ CAPACITY failed
sd 1:2:18:0: [sdj]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 1:2:18:0: [sdj]  Sense Key : Illegal Request [current] 
sd 1:2:18:0: [sdj]  Add. Sense: Invalid command operation code
sd 0:0:18:0: [sde] Asking for cache data failed
sd 0:0:18:0: [sde] Assuming drive cache: write through
sd 0:0:18:0: [sde] Attached SCSI disk
sd 1:0:5:0: [sdg] 585937500 512-byte logical blocks: (300 GB/279 GiB)
sd 1:0:4:0: [sdf] Write Protect is off
sd 1:2:18:0: [sdj] Test WP failed, assume Write Enabled
sd 0:0:6:0: [sdc] Write Protect is off
sd 1:2:18:0: [sdj] Asking for cache data failed
sd 1:2:18:0: [sdj] Assuming drive cache: write through
sd 1:2:18:0: [sdj] Attached SCSI disk
sd 0:0:7:0: [sdd] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 0:0:4:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 1:0:6:0: [sdh] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 1:0:4:0: [sdf] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 1:0:5:0: [sdg] Write Protect is off
sd 1:0:7:0: [sdi] Write Protect is off
sd 0:0:6:0: [sdc] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 1:0:7:0: [sdi] Write cache: disabled, read cache: enabled, supports DPO and FUA
sd 1:0:5:0: [sdg] Write cache: disabled, read cache: enabled, supports DPO and FUA
 sdb: unknown partition table
 sdd: unknown partition table
 sdh: unknown partition table
 sdi: unknown partition table
 sdf: [AIX] unknown partition table
 sdg: unknown partition table
e1000e 0002:03:00.0: Disabling ASPM  L1
 sdc: unknown partition table
e1000e 0002:03:00.0: (unregistered net_device): Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
scsi 1:254:0:0: Processor         IBM      2B4C001SISIOA    0150 PQ: 0 ANSI: 0
 sda: [AIX] unknown partition table
sd 0:0:5:0: [sdb] Attached SCSI disk
sd 1:0:6:0: [sdh] Attached SCSI disk
sd 1:0:7:0: [sdi] Attached SCSI disk
sd 0:0:6:0: [sdc] Attached SCSI disk
sd 1:0:5:0: [sdg] Attached SCSI disk
sd 0:0:7:0: [sdd] Attached SCSI disk
scsi 1:254:0:0: Attached scsi generic sg16 type 3
sd 1:0:4:0: [sdf] Attached SCSI disk
sd 0:0:4:0: [sda] Attached SCSI disk

Cheers,
Ben.
> 
> 
> > Thanks,
> > 
> > Brian
> > 
> > On 06/21/2011 11:07 AM, Nishanth Aravamudan wrote:
> > > Ping on this -- Tejun, Brian, Jeff, this is a pretty annoying 2.6.39
> > > regression which it would be good to have fixed in 3.0.
> > > 
> > > Thanks,
> > > Nish
> > > 
> > > On 16.06.2011 [08:28:36 -0700], Nishanth Aravamudan wrote:
> > >> On 16.06.2011 [08:28:39 -0500], Brian King wrote:
> > >>> On 06/16/2011 02:51 AM, Tejun Heo wrote:
> > >>>> On Wed, Jun 15, 2011 at 04:34:17PM -0700, Nishanth Aravamudan wrote:
> > >>>>>> That looks like the right thing to do. For ipr's usage of
> > >>>>>> libata, we don't have the concept of a port frozen state, so this flag
> > >>>>>> should really never get set. The alternate way to fix this would be to
> > >>>>>> only set ATA_PFLAG_FROZEN in ata_port_alloc if ap->ops->error_handler
> > >>>>>> is not NULL.
> > >>>>>
> > >>>>> It seemed like ipr is as you say, but I wasn't sure if it was
> > >>>>> appropriate to make the change above in the common libata-scis code or
> > >>>>> not. I don't want to break some other device on accident.
> > >>>>>
> > >>>>> Also, I tried your suggestion, but I don't think that can happen in
> > >>>>> ata_port_alloc? ata_port_alloc is allocated ap itself, and it seems like
> > >>>>> ap->ops typically gets set only after ata_port_alloc returns?
> > >>>>
> > >>>> Maybe we can test error_handler in ata_sas_port_start()?
> > >>>
> > >>> Good point. Since libsas is converted to the new eh now, we would need to have
> > >>> this test.
> > >>
> > >> Commit 7b3a24c57d2eeda8dba9c205342b12689c4679f9 ("ahci: don't enable
> > >> port irq before handler is registered") caused a regression for CD-ROMs
> > >> attached to the IPR SATA bus on Power machines:
> > >>
> > >>   ata_port_alloc: ENTER
> > >>   ata_port_probe: ata1: bus probe begin
> > >>   ata1.00: ata_dev_read_id: ENTER
> > >>   ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
> > >>   ata1.00: ata_dev_read_id: ENTER
> > >>   ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
> > >>   ata1.00: limiting speed to UDMA7:PIO5
> > >>   ata1.00: ata_dev_read_id: ENTER
> > >>   ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
> > >>   ata1.00: disabled
> > >>   ata_port_probe: ata1: bus probe end
> > >>   scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
> > >>
> > >> The FROZEN flag added in that commit is only cleared by the new EH code,
> > >> which is not used by ipr. Clear this flag in the SAS code if we don't
> > >> support new EH.
> > >>
> > >> Reported-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
> > >> Signed-off-by: Nishanth Aravamudan <nacc@xxxxxxxxxx>
> > >>
> > >> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> > >> index d51f979..ebe1685 100644
> > >> --- a/drivers/ata/libata-scsi.c
> > >> +++ b/drivers/ata/libata-scsi.c
> > >> @@ -3797,6 +3797,12 @@ EXPORT_SYMBOL_GPL(ata_sas_port_alloc);
> > >>   */
> > >>  int ata_sas_port_start(struct ata_port *ap)
> > >>  {
> > >> +	/*
> > >> +	 * the port is marked as frozen at allocation time, but if we don't
> > >> +	 * have new eh, we won't thaw it
> > >> +	 */
> > >> +	if (!ap->ops->error_handler)
> > >> +		ap->pflags &= ~ATA_PFLAG_FROZEN;
> > >>  	return 0;
> > >>  }
> > >>  EXPORT_SYMBOL_GPL(ata_sas_port_start);
> > >>
> > >>
> > >> -- 
> > >> Nishanth Aravamudan <nacc@xxxxxxxxxx>
> > >> IBM Linux Technology Center
> > > 
> > 
> > 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux