On Sun, Apr 28, 2019 at 03:39:32PM +0800, Ming Lei wrote: > Now scsi_mq_setup_tags() pre-allocates a big buffer for IO sg list, > and the buffer size is scsi_mq_sgl_size() which depends on smaller > value between shost->sg_tablesize and SG_CHUNK_SIZE. > > Modern HBA's DMA is often capable of deadling with very big segment > number, so scsi_mq_sgl_size() is often big. Suppose the max sg number > of SG_CHUNK_SIZE is taken, scsi_mq_sgl_size() will be 4KB. > > Then if one HBA has lots of queues, and each hw queue's depth is > high, pre-allocation for sg list can consume huge memory. > For example of lpfc, nr_hw_queues can be 70, each queue's depth > can be 3781, so the pre-allocation for data sg list is 70*3781*2k > =517MB for single HBA. > > There is Red Hat internal report that scsi_debug based tests can't > be run any more since legacy io path is killed because too big > pre-allocation. > > So switch to runtime allocation for sg list, meantime pre-allocate 2 > inline sg entries. This way has been applied to NVMe PCI for a while, > so it should be fine for SCSI too. Also runtime sg entries allocation > has verified and run always in the original legacy io path. > > Not see performance effect in my big BS test on scsi_debug. > This patch causes a variety of boot failures in -next. Typical failure pattern is scsi hangs or failure to find a root file system. For example, on alpha, trying to boot from usb: scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 sd 0:0:0:0: Power-on or device reset occurred sd 0:0:0:0: [sda] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] Attached SCSI disk usb 1-1: reset full-speed USB device number 2 using ohci-pci sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 00 58 00 00 42 00 print_req_error: I/O error, dev sda, sector 88 flags 80000 EXT4-fs error (device sda): __ext4_get_inode_loc:4703: inode #2: block 44: comm swapper: unable to read itable block EXT4-fs (sda): get root inode failed EXT4-fs (sda): mount failed VFS: Cannot open root device "sda" or unknown-block(8,0): error -5 Similar problems are seen with other architectures. Reverting the patch fixes the problem. Bisect log attached. Guenter --- # bad: [ae3cad8f39ccf8d31775d9737488bccf0e44d370] Add linux-next specific files for 20190603 # good: [f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a] Linux 5.2-rc3 git bisect start 'HEAD' 'v5.2-rc3' # good: [8ff6f4c6e067a9d3f3bbacf02c4ea5eb81fe2c6a] Merge remote-tracking branch 'crypto/master' git bisect good 8ff6f4c6e067a9d3f3bbacf02c4ea5eb81fe2c6a # good: [6c93755861ce6a6dd904df9cdae9f08671132dbe] Merge remote-tracking branch 'iommu/next' git bisect good 6c93755861ce6a6dd904df9cdae9f08671132dbe # good: [1a567956cb3be5754d94ce9380a2151e57e204a7] Merge remote-tracking branch 'cgroup/for-next' git bisect good 1a567956cb3be5754d94ce9380a2151e57e204a7 # bad: [a6878ca73cf30b83efbdfb1ecc443d7cfb2d8193] Merge remote-tracking branch 'rtc/rtc-next' git bisect bad a6878ca73cf30b83efbdfb1ecc443d7cfb2d8193 # bad: [25e7f57cf9f86076704b543628a0f02d3d733726] Merge remote-tracking branch 'gpio/for-next' git bisect bad 25e7f57cf9f86076704b543628a0f02d3d733726 # bad: [28e85db94534c92fa00d787264e3ea1843d1dc42] scsi: lpfc: Fix nvmet handling of received ABTS for unmapped frames git bisect bad 28e85db94534c92fa00d787264e3ea1843d1dc42 # bad: [cf9eddf616bb03387e597629142b0be34111e8d0] scsi: hpsa: check for tag collision git bisect bad cf9eddf616bb03387e597629142b0be34111e8d0 # good: [4e74166c52a836f05d4bd8270835703908b34d3e] scsi: libsas: switch sas_ata.[ch] to SPDX tags git bisect good 4e74166c52a836f05d4bd8270835703908b34d3e # good: [f186090846c29fd9760917fb3d01f095c39262e0] scsi: lib/sg_pool.c: improve APIs for allocating sg pool git bisect good f186090846c29fd9760917fb3d01f095c39262e0 # bad: [12b6b55806928690359bb21de3a19199412289fd] scsi: sd: Inline sd_probe_part2() git bisect bad 12b6b55806928690359bb21de3a19199412289fd # bad: [c3288dd8c2328a74cb2157dff18d13f6a75cedd2] scsi: core: avoid pre-allocating big SGL for data git bisect bad c3288dd8c2328a74cb2157dff18d13f6a75cedd2 # good: [0f0e744eae6c8af361d227d3a2286973351e5a2a] scsi: core: avoid pre-allocating big SGL for protection information git bisect good 0f0e744eae6c8af361d227d3a2286973351e5a2a # first bad commit: [c3288dd8c2328a74cb2157dff18d13f6a75cedd2] scsi: core: avoid pre-allocating big SGL for data