On Sun, Dec 09, 2018 at 11:44:19AM -0500, Theodore Y. Ts'o wrote: > On Sun, Dec 09, 2018 at 12:30:39PM +0100, Greg KH wrote: > > > P.P.P.S. If I were king, I'd be asking for a huge number of kunit > > > tests for block-mq to be developed, and then running them under a > > > Thread Sanitizer. > > > > Isn't that what xfs and fio is? Aren't we running this all the time and > > reporting those issues? How did this bug not show up on those tests, is > > it just because they didn't run long enough? > > > > Because of those test suites, I was thinking that the block and > > filesystem paths were one of the more well-tested things we had at the > > moment, is this not true? > > I'm pretty confident about the file system paths, and the "happy > paths" for the block layer. > > But with Kernel Bugzilla #201685, despite huge amounts both before and > after 4.19-rc1, nothing picked it up. It turned out to be very > configuration specific, *and* only happened when you were under heavy > memory pressure and/or I/O pressure. > > I'm starting to try to use blktests, but it's not as mature as > xfstests. It has portability issues, as it assumes a much newer > userspace. So I can't even run it under some environments at all. > The test coverage just isn't as broad. Compare: > > ext4/4k: 441 tests, 1 failures, 42 skipped, 4387 seconds > Failures: generic/388 > > Versus: > > Run: block/001 block/002 block/003 block/004 block/005 block/006 > block/009 block/010 block/012 block/013 block/014 block/015 > block/016 block/017 block/018 block/020 block/021 block/023 > block/024 loop/001 loop/002 loop/003 loop/004 loop/005 loop/006 > nvme/002 nvme/003 nvme/004 nvme/006 nvme/007 nvme/008 nvme/009 > nvme/010 nvme/011 nvme/012 nvme/013 nvme/014 nvme/015 nvme/016 > nvme/017 nvme/019 nvme/020 nvme/021 nvme/022 nvme/023 nvme/024 > nvme/025 nvme/026 nvme/027 nvme/028 scsi/001 scsi/002 scsi/003 > scsi/004 scsi/005 scsi/006 srp/001 srp/002 srp/003 srp/004 > srp/005 srp/006 srp/007 srp/008 srp/009 srp/010 srp/011 srp/012 srp/013 > Failures: block/017 block/024 nvme/002 nvme/003 nvme/008 nvme/009 > nvme/010 nvme/011 nvme/012 nvme/013 nvme/014 nvme/015 nvme/016 > nvme/019 nvme/020 nvme/021 nvme/022 nvme/023 nvme/024 nvme/025 > nvme/026 nvme/027 nvme/028 scsi/006 srp/001 srp/002 srp/003 srp/004 > srp/005 srp/006 srp/007 srp/008 srp/009 srp/010 srp/011 srp/012 srp/013 > Failed 37 of 69 tests > > (Most of the failures are test portability issues that I still need to > work through, not real failures. But just look at the number of > tests....) So you are saying quantity rules over quantity? :) It's really hard to judge this, given that xfstests are testing a whole range of other things (POSIX compliance and stressing the vfs api), while blktests are there to stress the block i/o api/interface. So both would be best to run as we know xfstests also hits the block layer... thanks, greg k-h