Re: lk 3.17-rc4 blk_mq large write problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14-09-09 11:55 PM, Douglas Gilbert wrote:
A few days ago I was trying to create a large file
(say 16 GB) of zeros on an ext4 file system:
    dd if=/dev/zero bs=64k count=256k of=zero_16g.bin

After about 5 seconds there was a NULL de-reference that
crashed the machine (shown below). This was with a clean
version of lk 3.17-rc4 (from kernel.org) where the target
was a SATA SSD directly connected to a LSI 9300-4i SAS-3
HBA (mpt3sas). Significantly (IMO) the kernel boot line
contained:
    scsi_mod.use_blk_mq=Y

In all cases changing that to "N" fixed the problem. I tried
many things, including a SAS SSD but the problem persisted when
use_blk_mq=Y. It doesn't always oops as shown in the first
case below. There were also:
   - immediate reboots
   - lock-ups without any oops on the console
   - different oopses of a somewhat stranger nature
     (hard to catch as logging everything on a real
      serial port is fiddly) like double bus errors

Rob Elliott has been unable to replicate this problem.

Today I switched to another machine running Debian 7 (the
first machine was Ubuntu 14.04 based); both x86_64.
Built the same kernel on the second machine, this time
with a LSI 9212-4i4e SAS-2 HBA (mpt2sas) and a SAS SSD
directly connected. Roughly speaking it was the same
test case:
   # <create 1 partition on say /dev/sdb>
   # mkfs.ext4 /dev/sdb1
   # mount /dev/sdb1 /mnt/spare
   # cd /mnt/spare
   # dd if=/dev/zero bs=64k count=256k of=zero_16g.bin
   # cd
   # umount /mnt/spare

Usually the dd or the umount would crash. Then after a
crash, following a power cycle, the mount would crash.
Changing to scsi_mod.use_blk_mq=N restored sanity.

Tried some other SAS controllers: couldn't get a MR-9240-4i
(MegaRaid) to work at all on my newer box (doesn't like
PCIe 3 ?). Got a ARC-1882I working and it did not have
problems with the big dd (perhaps the arcmsr driver still
uses the host_lock to serialize commands).

So it could be common, bad code in the mpt2sas and mpt3sas
drivers. Or it could be somewhere else. Perhaps there is
more than one problem.

Testers out there are encouraged to run the above test case.
The SATA and SAS SSDs that I used can consume writes in the
300 to 600 MB/sec range.

Part of the strangeness of this first attached oops is that
blk_mq_timeout_check() appears twice. The second one (typically
from the umount) is a blown stack.

Using the block/for-linus tree that I built today,
the freeze-during-boot-up problem has gone away as
reported earlier.

That allows me to retest the problem reported in this
thread with the same disk (INTEL SSDSA2M080) and the
same configuration. Just did four cycles of the test
sequence shown above plus a shutdown. No problems seen.

Doug Gilbert

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux