regression next-20220714: mkfs.ext4 on multipath device over scsi disks causes 'lifelock' in block layer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 02, 2022 at 12:35:55AM -0500, Martin K. Petersen wrote:
> In preparation for adding support for the WRITE SAME(16) NDOB flag,
> move configuration of the WRITE_ZEROES operation to a separate
> function. This is done to facilitate fetching all VPD pages before
> choosing the appropriate zeroing method for a given device.
> 
> The deferred configuration also allows us to mirror the discard
> behavior and permit the user to revert a device to the kernel default
> configuration by echoing "default" to the sysfs file.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
> ---
>  drivers/scsi/sd.c | 56 +++++++++++++++++++++++++++++++++--------------
>  drivers/scsi/sd.h |  7 ++++--
>  2 files changed, 44 insertions(+), 19 deletions(-)
> 

Hello Martin,

somehow this patch triggers a regression on s390x with zFCP in
`next-20220714`.

In our daily regression test suite a simple:

  # mkfs.ext4 -F /dev/mapper/mpathc1

causes the block layer to trip with this when trying to discard blocks
(at least that's my assumption about what its trying to do) from a SCSI
disk:

  [   33.042224] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   33.042228] device-mapper: multipath: 251:0: Failing path 8:0.
  [   33.042239] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   33.042267] device-mapper: multipath: 251:0: Failing path 8:64.
  [   33.197329] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   33.198850] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   33.210742] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   33.210752] device-mapper: multipath: 251:0: Failing path 8:0.
  [   33.210771] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   33.210792] device-mapper: multipath: 251:0: Failing path 8:64.
  [   38.200929] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   38.201489] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   38.220039] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   38.220045] device-mapper: multipath: 251:0: Failing path 8:0.
  [   38.220056] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   38.220060] device-mapper: multipath: 251:0: Failing path 8:64.
  [   43.202538] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   43.203015] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   43.219877] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   43.219881] device-mapper: multipath: 251:0: Failing path 8:0.
  [   43.219889] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   43.219892] device-mapper: multipath: 251:0: Failing path 8:64.
  [   48.204035] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   48.204526] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   48.219951] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   48.219964] device-mapper: multipath: 251:0: Failing path 8:0.
  [   48.219990] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   48.219996] device-mapper: multipath: 251:0: Failing path 8:64.
  [   53.205456] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   53.206950] device-mapper: multipath: 251:0: Reinstating path 8:64.
  [   53.219820] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   53.219824] device-mapper: multipath: 251:0: Failing path 8:0.
  [   53.219834] blk_insert_cloned_request: over max size limit. (4194304 > 65535)
  [   53.219837] device-mapper: multipath: 251:0: Failing path 8:64.
  [   58.209693] device-mapper: multipath: 251:0: Reinstating path 8:0.
  [   58.210143] device-mapper: multipath: 251:0: Reinstating path 8:64.

This continues ad infinitum.

I suspected this patchset as it's new in next-20220714, and next-20220704
didn't have this bug in our regression runs. I didn't see any other 'obvious'
patch in scsi or block that has a diff between those two tags.

I started bisecting between:
  9821106213c8 ("scsi: zfcp: Drop redundant "the" in the comments")
as good, and
  f095c3cd1b69 ("scsi: qla2xxx: Update version to 10.02.07.800-k")
as bad; and ended up at:
  1bd95bb98f83 ("scsi: sd: Move WRITE_ZEROES configuration to a separate function")

I ran this on Fedora 36, with mkfs.ext4 1.46.5 (30-Dec-2021).

The multipath device:

  create: mpathc (36005076307ffc5e3000000000000805c) dm-0 IBM,2107900
  size=20G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
  `-+- policy='service-time 0' prio=50 status=active
    |- 0:0:0:1079787648 sda 8:0   active ready running
    `- 1:0:0:1079787648 sde 8:64  active ready running

Some information on the block devices and topology:

  # lsblk /dev/sda /dev/sde
  NAME        MAJ:MIN RM SIZE RO TYPE  MOUNTPOINTS
  sda           8:0    0  20G  0 disk
  └─mpathc    251:0    0  20G  0 mpath
    └─mpathc1 251:2    0  20G  0 part
  sde           8:64   0  20G  0 disk
  └─mpathc    251:0    0  20G  0 mpath
    └─mpathc1 251:2    0  20G  0 part
  # lsblk -t /dev/sda /dev/sde
  NAME        ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED       RQ-SIZE  RA WSAME
  sda                 0    512      0     512     512    1 bfq             256 512    0B
  └─mpathc            0    512      0     512     512    1 mq-deadline     256 128    0B
    └─mpathc1         0    512      0     512     512    1                 128 128    0B
  sde                 0    512      0     512     512    1 bfq             256 512    0B
  └─mpathc            0    512      0     512     512    1 mq-deadline     256 128    0B
    └─mpathc1         0    512      0     512     512    1                 128 128    0B
  # lsblk -D /dev/sda /dev/sde
  NAME        DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
  sda                0        1G       4G         0
  └─mpathc           0        1G       4G         0
    └─mpathc1        0        1G       4G         0
  sde                0        1G       4G         0
  └─mpathc           0        1G       4G         0
    └─mpathc1        0        1G       4G         0
  # lsblk -S /dev/sda /dev/sde
  NAME HCTL             TYPE VENDOR   MODEL    REV SERIAL      TRAN
  sda  0:0:0:1079787648 disk IBM      2107900 1060 75DL241805C fc
  sde  1:0:0:1079787648 disk IBM      2107900 1060 75DL241805C fc

Any idea why this is happening? In case you need more details, this
reproduces very reliably here.

-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH    /    https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen         /         Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux