[Bug 216964] New: LSI SAS1068 logical volume caching mode not detected (with patch)

bugzilla-daemon@xxxxxxxxxx · Mon, 23 Jan 2023 23:27:05 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=216964

            Bug ID: 216964
           Summary: LSI SAS1068 logical volume caching mode not detected
                    (with patch)
           Product: SCSI Drivers
           Version: 2.5
    Kernel Version: 5.15.58
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: scsi_drivers-other@xxxxxxxxxxxxxxxxxxxx
          Reporter: michal.ruza@xxxxxxxxx
        Regression: No

Created attachment 303641
  --> https://bugzilla.kernel.org/attachment.cgi?id=303641&action=edit
Patch to use fixed size buffer for the buggy MODE SENSE command

Hardware:
   HBA card: Broadcom / LSI SAS1068 PCI-X Fusion-MPT SAS
PCI VID:PID: 1000:0054

Problem: Caching mode of logical volumes managed by the controller in question
is not detected.

Relevant kernel messages:
[   19.642388] scsi 8:1:0:0: Direct-Access     LSILOGIC Logical Volume   3000
PQ: 0 ANSI: 2
[   19.649179] sd 8:1:0:0: Attached scsi generic sg6 type 0
[   19.649390] sd 8:1:0:0: [sdd] 583983104 512-byte logical blocks: (299 GB/278
GiB)
[   19.649625] sd 8:1:0:0: [sdd] Write Protect is off
[   19.649629] sd 8:1:0:0: [sdd] Mode Sense: 03 00 00 08
[   19.649837] sd 8:1:0:0: [sdd] No Caching mode page found
[   19.649853] sd 8:1:0:0: [sdd] Assuming drive cache: write through
[   19.666881]  sdd: sdd1 sdd2 sdd3
[   19.667776] sd 8:1:0:0: [sdd] Attached SCSI disk

Cause of the problem:
The SCSI MODE SENSE command is broken for the logical volumes managed by the
controller in question in that it does not set the length field in the returned
response to the length of the entire response but rather only to the length of
the portion of the response actually written to the provided buffer (which is
obviously limited by the length of the provided buffer). This breaks the logic
in sd_read_cache_type [1] which first tries to determine the size of the entire
response by executing the MODE SENSE command with a small buffer and then uses
the length field from the partial response to size the buffer for the entire
response appropriately. This does not work for the logical volumes managed by
the controller in question as for them the reported response length is never
greater than the length of the provided buffer (in fact it is always 3 as
evidenced by the first byte in the "Mode Sense:" log message - which is the
length of the small buffer provided to the MODE SENSE command less the length
byte itself), so the response is never received in its entirety, which leads to
the caching mode detection failure.

The problem can be demonstrated by the sg_modes command:
- invoked on the misbehaving logical volume:
# sg_modes -6 -p 8 -m 4 -d -H /dev/sdd
    LSILOGIC  Logical Volume    3000   peripheral_type: disk [0x0]
 00     03 00 00 00
- invoked on a correctly behaving disk:
# sg_modes -6 -p 8 -m 4 -d -H /dev/sda
    ATA       WDC WD4003FFBX-6  0A83   peripheral_type: disk [0x0]
 00     17 00 00 00
Notice the difference in the length field - the first byte of the response.

Nevertheless the misbehaving logical volume _does_ report the caching mode
correctly when the relevant MODE SENSE command is executed with large enough
buffer. Again this can be demonstrated by the sg_modes command:
# sg_modes -6 -p 8 -m 192 -d -H /dev/sdd
    LSILOGIC  Logical Volume    3000   peripheral_type: disk [0x0]
 00     17 00 00 00 08 12 04 00  ff ff 00 00 ff 00 ff ff
 10     00 0f 00 00 00 00 00 00

Possible fix:
It turns out there is already a flag in struct scsi_device which forces the
relevant MODE SENSE command to be executed with a 192 bytes long buffer (which
is long enough to hold the entire response): use_192_bytes_for_3f [2].
When this flag is set for the misbehaving disk/logical volume (together with
the skip_ms_page_8 flag), the caching mode detection works correctly. This has
been verified by applying the attached patch.
With the patch applied, the relevant kernel messages look like this:
[   19.263001] scsi 8:1:0:0: Direct-Access     LSILOGIC Logical Volume   3000
PQ: 0 ANSI: 2
[   19.263190] sd 8:1:0:0: Attached scsi generic sg6 type 0
[   19.263413] sd 8:1:0:0: [sdd] 583983104 512-byte logical blocks: (299 GB/278
GiB)
[   19.263690] sd 8:1:0:0: [sdd] Write Protect is off
[   19.263694] sd 8:1:0:0: [sdd] Mode Sense: 67 00 00 08
[   19.263970] sd 8:1:0:0: [sdd] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
[   19.279922]  sdd: sdd1 sdd2 sdd3
[   19.280904] sd 8:1:0:0: [sdd] Attached SCSI disk

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/scsi/sd.c?h=v5.15.58#n2687
    https://elixir.bootlin.com/linux/v5.15.58/source/drivers/scsi/sd.c#L2687

[2]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/scsi/scsi_device.h?h=v5.15.58#n184

https://elixir.bootlin.com/linux/v5.15.58/source/include/scsi/scsi_device.h#L184

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.