Re: Kernel 6.8.4 regression: aacraid controller not initialized any more, system boot hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 09.05.2024 um 04:12 schrieb Peter Schneider:

> Hi Martin,
>
> Am 09.05.2024 um 03:38 schrieb Martin K. Petersen:
>  >
>  > Hi Peter!
>  >
>  > Thanks for the detailed bug report.
>
> Thanks that you are looking into the issue! I thought I'd be also CC'ing the relevant
> regressions tracker+mailing list. For reference, my original message can be found here:
>
> https://lore.kernel.org/all/eec6ebbf-061b-4a7b-96dc-ea748aa4d035@xxxxxxxxxxxxxx/
>
> [...]
>
>  > Can you please send me the output of:
>  >
>  > # sg_vpd -a /dev/sda
>  > # sg_readcap -l /dev/sda
>  >
>  > where sda is one of the aacraid volumes.
>
>
> Here you go... sda is the 1TiB RAID1 array, sdb is the 5TiB RAID5 array.
>
>
> root@linus:~# uname -r
> 6.5.13-5-pve
> root@linus:~# sg_vpd -a /dev/sda
> Supported VPD pages VPD page:
>    Supported VPD pages [sv]
>    Unit serial number [sn]
>    Device identification [di]
>
> Unit serial number VPD page:
>    Unit serial number: 50C0B82D
>
> Device Identification VPD page:
>    Addressed logical unit:
>      designator type: T10 vendor identification,  code set: ASCII
>        vendor id: ADAPTEC
>        vendor specific: ARRAY           50C0B82D
>      designator type: EUI-64 based,  code set: Binary
>        0x2db8c05000d00000
> root@linus:~# sg_readcap -l /dev/sda
> Read Capacity results:
>     Protection: prot_en=0, p_type=0, p_i_exponent=0
>     Logical block provisioning: lbpme=0, lbprz=0
>     Last LBA=1998565375 (0x771fafff), Number of logical blocks=1998565376
>     Logical block length=512 bytes
>     Logical blocks per physical block exponent=0
>     Lowest aligned LBA=0
> Hence:
>     Device size: 1023265472512 bytes, 975862.0 MiB, 1023.27 GB
> root@linus:~# sg_vpd -a /dev/sdb
> Supported VPD pages VPD page:
>    Supported VPD pages [sv]
>    Unit serial number [sn]
>    Device identification [di]
>
> Unit serial number VPD page:
>    Unit serial number: 8718162D
>
> Device Identification VPD page:
>    Addressed logical unit:
>      designator type: T10 vendor identification,  code set: ASCII
>        vendor id: ADAPTEC
>        vendor specific: ARRAY           8718162D
>      designator type: EUI-64 based,  code set: Binary
>        0x2d16188700d00000
> root@linus:~# sg_readcap -l /dev/sdb
> Read Capacity results:
>     Protection: prot_en=0, p_type=0, p_i_exponent=0
>     Logical block provisioning: lbpme=0, lbprz=0
>     Last LBA=9762222079 (0x245dfafff), Number of logical blocks=9762222080
>     Logical block length=512 bytes
>     Logical blocks per physical block exponent=0
>     Lowest aligned LBA=0
> Hence:
>     Device size: 4998257704960 bytes, 4766710.0 MiB, 4998.26 GB, 5.00 TB
>
>
> Beste Grüße,
> Peter Schneider
>


I just found something else which looks interesting and might or might not be related to the regression. To get the requested diagnostic output you asked for, I obviously booted into the working kernel version 6.5.13-5-pve, see above. Out of curiousity, I used these commands also onto my other drives, sdc (PVE installation and root device) and sdf (my storage for VM ISO installation images). These are both older Micron M4 SATA SSD drives.

Turns out, these drives seem to have a buggy firmware. They don't return all the VPD pages they advertise. Querying for the advertised VPD page=0xb7 gives "sg_vpd failed: Illegal request", please see below... Is this a smoking gun? They both were previously used by me in a Windows box for ~2 years, till I replaced them and put them aside. Then in 2015 I recycled them for use in my newly built server machine. Before original use in the mentioned Windows box, I upgraded their firmware to 0309, because the factory firmware had known issues with Windows.

In 2015, I didn't care to look again for a newer firmware. But there is one, 070h, here:

https://www.crucial.de/support/ssd-support/m4-25-inch-support

and in the release notes

https://content.crucial.com/content/dam/crucial/ssd-products/m4/documents/crucial-m4-firmware-update-070h-en.pdf

there is mention of a potential device hang during power up being fixed with FW 070h.

Do you think I should try to apply this FW upgrade, to see if
- this fixes the below issue of advertised VPD page not being returned
- this could probably fix the whole regression issue with the Adaptec controller not initialized any more with your kernel patch b5fc07a5fb56216a49e6c1d0b172d5464d99a89b ?

Or is this just guesswork? I mean, in the dmesg output, the Adaptec controller is initialized BEFORE sdc and sdd. I don't know...





root@linus:~# sg_vpd -a /dev/sdc
Supported VPD pages VPD page:
  Supported VPD pages [sv]
  Unit serial number [sn]
  Device identification [di]
  ATA information (SAT) [ai]
  Block limits (SBC) [bl]
  Block device characteristics (SBC) [bdc]
  Logical block provisioning (SBC) [lbpv]
  Concurrent positioning ranges [cpr]

Unit serial number VPD page:
  Unit serial number: 000000001141031B85A2

Device Identification VPD page:
  Addressed logical unit:
    designator type: vendor specific [0x0],  code set: ASCII
      vendor specific: 000000001141031B85A2
    designator type: T10 vendor identification,  code set: ASCII
      vendor id: ATA
      vendor specific: M4-CT256M4SSD2                          000000001141031B85A2
    designator type: NAA,  code set: Binary
      0x500a0751031b85a2

ATA information VPD page:
  SAT Vendor identification: linux
  SAT Product identification: libata
  SAT Product revision level: 3.00
  Device signature indicates SATA transport
  Command code: 0xec
  ATA command IDENTIFY DEVICE response summary:
    model: M4-CT256M4SSD2
    serial number: 000000001141031B85A2
    firmware revision: 0309

Block limits VPD page (SBC):
  Write same non-zero (WSNZ): 0
  Maximum compare and write length: 0 blocks [Command not implemented]
  Optimal transfer length granularity: 1 blocks
  Maximum transfer length: 0 blocks [not reported]
  Optimal transfer length: 0 blocks [not reported]
  Maximum prefetch transfer length: 0 blocks [ignored]
  Maximum unmap LBA count: 0 [Unmap command not implemented]
  Maximum unmap block descriptor count: 0 [Unmap command not implemented]
  Optimal unmap granularity: 1 blocks
  Unmap granularity alignment valid: false
  Unmap granularity alignment: 0 [invalid]
  Maximum write same length: 0x3fffc0 blocks
  Maximum atomic transfer length: 0 blocks [not reported]
  Atomic alignment: 0 [unaligned atomic writes permitted]
  Atomic transfer length granularity: 0 [no granularity requirement
  Maximum atomic transfer length with atomic boundary: 0 blocks [not reported]
  Maximum atomic boundary size: 0 blocks [can only write atomic 1 block]

Block device characteristics VPD page (SBC):
  Non-rotating medium (e.g. solid state)
  Product type: Not specified
  WABEREQ=0
  WACEREQ=0
  Nominal form factor: 2.5 inch
  ZONED=0
  RBWZ=0
  BOCS=0
  FUAB=0
  VBULS=0
  DEPOPULATION_TIME=0 (seconds)

Logical block provisioning VPD page (SBC):
  Unmap command supported (LBPU): 0
  Write same (16) with unmap bit supported (LBPWS): 1
  Write same (10) with unmap bit supported (LBPWS10): 0
  Logical block provisioning read zeros (LBPRZ): 0
  Anchored LBAs supported (ANC_SUP): 0
  Threshold exponent: 0 [threshold sets not supported]
  Descriptor present (DP): 0
  Minimum percentage: 0 [not reported]
  Provisioning type: 0 (not known or fully provisioned)
  Threshold percentage: 0 [percentages not supported]

VPD page=0xb7
fetching VPD page failed: Illegal request
sg_vpd failed: Illegal request
root@linus:~# sg_vpd -a /dev/sdf
Supported VPD pages VPD page:
  Supported VPD pages [sv]
  Unit serial number [sn]
  Device identification [di]
  ATA information (SAT) [ai]
  Block limits (SBC) [bl]
  Block device characteristics (SBC) [bdc]
  Logical block provisioning (SBC) [lbpv]
  Concurrent positioning ranges [cpr]

Unit serial number VPD page:
  Unit serial number: 00000000120103285ED2

Device Identification VPD page:
  Addressed logical unit:
    designator type: vendor specific [0x0],  code set: ASCII
      vendor specific: 00000000120103285ED2
    designator type: T10 vendor identification,  code set: ASCII
      vendor id: ATA
      vendor specific: M4-CT256M4SSD2                          00000000120103285ED2
    designator type: NAA,  code set: Binary
      0x500a075103285ed2

ATA information VPD page:
  SAT Vendor identification: linux
  SAT Product identification: libata
  SAT Product revision level: 3.00
  Device signature indicates SATA transport
  Command code: 0xec
  ATA command IDENTIFY DEVICE response summary:
    model: M4-CT256M4SSD2
    serial number: 00000000120103285ED2
    firmware revision: 0309

Block limits VPD page (SBC):
  Write same non-zero (WSNZ): 0
  Maximum compare and write length: 0 blocks [Command not implemented]
  Optimal transfer length granularity: 1 blocks
  Maximum transfer length: 0 blocks [not reported]
  Optimal transfer length: 0 blocks [not reported]
  Maximum prefetch transfer length: 0 blocks [ignored]
  Maximum unmap LBA count: 0 [Unmap command not implemented]
  Maximum unmap block descriptor count: 0 [Unmap command not implemented]
  Optimal unmap granularity: 1 blocks
  Unmap granularity alignment valid: false
  Unmap granularity alignment: 0 [invalid]
  Maximum write same length: 0x3fffc0 blocks
  Maximum atomic transfer length: 0 blocks [not reported]
  Atomic alignment: 0 [unaligned atomic writes permitted]
  Atomic transfer length granularity: 0 [no granularity requirement
  Maximum atomic transfer length with atomic boundary: 0 blocks [not reported]
  Maximum atomic boundary size: 0 blocks [can only write atomic 1 block]

Block device characteristics VPD page (SBC):
  Non-rotating medium (e.g. solid state)
  Product type: Not specified
  WABEREQ=0
  WACEREQ=0
  Nominal form factor: 2.5 inch
  ZONED=0
  RBWZ=0
  BOCS=0
  FUAB=0
  VBULS=0
  DEPOPULATION_TIME=0 (seconds)

Logical block provisioning VPD page (SBC):
  Unmap command supported (LBPU): 0
  Write same (16) with unmap bit supported (LBPWS): 1
  Write same (10) with unmap bit supported (LBPWS10): 0
  Logical block provisioning read zeros (LBPRZ): 0
  Anchored LBAs supported (ANC_SUP): 0
  Threshold exponent: 0 [threshold sets not supported]
  Descriptor present (DP): 0
  Minimum percentage: 0 [not reported]
  Provisioning type: 0 (not known or fully provisioned)
  Threshold percentage: 0 [percentages not supported]

VPD page=0xb7
fetching VPD page failed: Illegal request
sg_vpd failed: Illegal request
root@linus:~# man sg_vpd





Beste Grüße,
Peter Schneider

--
Climb the mountain not to plant your flag, but to embrace the challenge,
enjoy the air and behold the view. Climb it so you can see the world,
not so the world can see you.                    -- David McCullough Jr.

OpenPGP:  0xA3828BD796CCE11A8CADE8866E3A92C92C3FF244
Download: https://www.peters-netzplatz.de/download/pschneider1968_pub.asc
https://keys.mailvelope.com/pks/lookup?op=get&search=pschneider1968@xxxxxxxxxxxxxx
https://keys.mailvelope.com/pks/lookup?op=get&search=pschneider1968@xxxxxxxxx

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux