Re: [PATCH] blk-mq: fix corruption with direct issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 05, 2018 at 11:03:01AM +0800, Ming Lei wrote:
> 
> But at that time, there isn't io scheduler for MQ, so in theory the
> issue should be there since v4.11, especially 945ffb60c11d ("mq-deadline:
> add blk-mq adaptation of the deadline IO scheduler").

Hi Ming,

How were serious you about this issue being there (theoretically) an
issue since 4.11?  Can you talk about how it might get triggered, and
how we can test for it?  The reason why I ask is because we're trying
to track down a mysterious file system corruption problem on a 4.14.x
stable kernel.  The symptoms are *very* eerily similar to kernel
bugzilla #201685.

The problem is that the problem is super-rare --- roughly once a week
out of a popuation of about 2500 systems.  The workload is NFS
serving.  Unfortunately, the problem is since 4.14.63, we can no
longer disable blk-mq for the virtio-scsi driver, thanks to the commit
b5b6e8c8d3b4 ("scsi: virtio_scsi: fix IO hang caused by automatic irq
vector affinity") getting backported into 4.14.63 as commit
70b522f163bbb32.

We're considering reverting this patch in our 4.14 LTS kernel, and
seeing whether it makes the problem go away.  Is there any thing else
you might suggest?

Thanks,

						- Ted

P.S.  Unlike the repro's that users were seeing in #201685, we *did*
have an I/O scheduler enabled --- it was mq-deadline.  But right now,
given your comments, and the corruptions that we're seeing, I'm not
feeling very warm and fuzzy about block-mq.  :-( :-( :-(



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux