On Fri, Dec 07, 2018 at 11:44:39AM +0800, Ming Lei wrote: > On Thu, Dec 06, 2018 at 09:46:42PM -0500, Theodore Y. Ts'o wrote: > > On Wed, Dec 05, 2018 at 11:03:01AM +0800, Ming Lei wrote: > > > > > > But at that time, there isn't io scheduler for MQ, so in theory the > > > issue should be there since v4.11, especially 945ffb60c11d ("mq-deadline: > > > add blk-mq adaptation of the deadline IO scheduler"). > > > > Hi Ming, > > > > How were serious you about this issue being there (theoretically) an > > issue since 4.11? Can you talk about how it might get triggered, and > > how we can test for it? The reason why I ask is because we're trying > > to track down a mysterious file system corruption problem on a 4.14.x > > stable kernel. The symptoms are *very* eerily similar to kernel > > bugzilla #201685. > > Hi Theodore, > > It is just a theory analysis. > > blk_mq_try_issue_directly() is called in two branches of blk_mq_make_request(), > both are on real MQ disks. > > IO merge can be done on none or real io schedulers, so in theory there might > be the risk from v4.1, but IO merge on sw queue didn't work for a bit long, > especially it was fixed by ab42f35d9cb5ac49b5a2. > > As Jens mentioned in bugzilla, there are several conditions required > for triggering the issue: > > - MQ device > > - queue busy can be triggered. It is hard to trigger in NVMe PCI, > but may be possible on NVMe FC. However, it can be quite easy to > trigger on SCSI devices. We know there are some MQ SCSI HBA, > qlogic FC, megaraid_sas. > > - IO merge is enabled. > > I have setup scsi_debug in the following way: > > modprobe scsi_debug dev_size_mb=4096 clustering=1 \ > max_luns=1 submit_queues=2 max_queue=2 > > - submit_queues=2 may set this disk as MQ > - max_queue=4 may trigger the queue busy condition easily > > and run some write IO on ext4 over the disk: fio, kernel building,... for > some time, but still can't trigger the data corruption once. > > I should have created more LUN, so that queue may be easier to become > busy, will do that soon. Actually I should have used SDEBUG_OPT_HOST_BUSY to simulate the queue busy. Thanks, Ming