On 12/11/18 8:34 PM, Mike Snitzer wrote: > On Mon, Dec 10 2018 at 7:32pm -0500, > Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > >> On Mon, Dec 10 2018 at 5:45pm -0500, >> Jens Axboe <axboe@xxxxxxxxx> wrote: >> >>> After switching to percpu inflight counters, the inflight check >>> is totally buggy. It's perfectly valid for some counters to be >>> non-zero while having a total inflight IO count of 0, that's how >>> these kinds of counters work (inc on one CPU, dec on another). >>> Fix the md_in_flight() check to sum all counters before returning >>> a false positive, potentially. >>> >>> While at it, remove the inflight read for IO completion. We don't >>> need it, just wake anyone that's waiting for the IO count to drop >>> to zero. The caller needs to re-check that value anyway when woken, >>> which it does. >>> >>> Fixes: 6f75723190d8 ("dm: remove the pending IO accounting") >>> Reported-by: Christoph Hellwig <hch@xxxxxx> >>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> >> I'm seeing that device-mapper-test-suite's "resize_io" test doesn't >> pass. Glad this resolves the xfstest issue but I think more work is >> needed, so I'll build any additional changes on this fix. > > Turns out the device-mapper-test-suite 'resize_io' issue I was seeing is > just a long lingering race that just got much more likely on my newer > testbed. This fixes what I was seeing: > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.20&id=f6c367585d0d851349d3a9e607c43e5bea993fa1 > > But pivoting back to xfstests generic/347: that test was broken from the > start, if it ever _really_ worked it would've been a bug in dm-thinp. > And even after Jens' fix (above) the test still just times out after 60 > secs while in out-of-data-space (OODS) mode and gives a false "Passed > all 1 tests". > > generic/347 exhausting the thin-pool's data space and then issuing > "sync" will cause IO to hang for 60 secs unless, in parallel, another > process comes along and grows the thin-pool to have more data space. > But even if xfstests' _dmthin_grow function were to be issued in > parallel with another process it'll fail because it uses > _dmthin_reload_table which "dmsetup suspend"s the thin device (that was > just used by XFS to fill the thin-pool) _without_ using the --noflush > flag for "dmsetup suspend". So the thin device will wait forever for > its outstanding IO to finish before generic/347's _dmthin_grow will be > able to resize the thin-pool. Leading to deadlock. Or until the 60 > secs (dm_thin_pool.ko "no_space_timeout" modparam default) causes the > thin-pool to switch from OODS "queue IO" to OODS "error IO" mode: > > # tail -f /var/log/messages & > # ./check generic/347 > Dec 11 21:12:57 thegoat kernel: XFS (dm-1): Mounting V5 Filesystem > Dec 11 21:12:57 thegoat kernel: XFS (dm-1): Ending clean mount > FSTYP -- xfs (non-debug) > PLATFORM -- Linux/x86_64 thegoat 4.20.0-rc6.snitm+ > MKFS_OPTIONS -- -f -bsize=4096 /dev/mapper/test-metadata > MOUNT_OPTIONS -- /dev/mapper/test-metadata /scratch > > Dec 11 21:12:58 thegoat kernel: device-mapper: thin: 253:4: reached low water mark for data device: sending event. > Dec 11 21:12:58 thegoat kernel: device-mapper: thin: 253:4: switching pool to out-of-data-space (queue IO) mode > Dec 11 21:14:01 thegoat kernel: device-mapper: thin: 253:4: switching pool to out-of-data-space (error IO) mode > Dec 11 21:14:01 thegoat kernel: xfs_destroy_ioend: 136 callbacks suppressed > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1354752 > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1358592 > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1362432 > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1366272 > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1370112 > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1373952 > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1377792 > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1381632 > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1385472 > Dec 11 21:14:01 thegoat kernel: XFS (dm-5): writeback error on sector 1389312 > Dec 11 21:14:02 thegoat kernel: device-mapper: thin: 253:4: switching pool to write mode > Dec 11 21:14:02 thegoat kernel: device-mapper: thin: 253:4: growing the data device from 1000 to 1200 blocks > Dec 11 21:14:02 thegoat kernel: device-mapper: thin: 253:4: reached low water mark for data device: sending event. > Dec 11 21:14:02 thegoat kernel: XFS (dm-5): Unmounting Filesystem > 65s > Ran: generic/347 > Passed all 1 tests > > Dec 11 21:14:02 thegoat kernel: XFS (dm-1): Unmounting Filesystem > > SO generic/347 needs to account for 2 things: > 1) issue $XFS_IO_PROG IO in the background with other process(es) > 2) in parallel, in main generic/347 shell, run _dmthin_grow but the > "thin" device must be suspended with dmsetup suspend --noflush > > Eric Sandeen: can you or someone else xfstests inclined use this info to > fix generic/347 please? It's not at all clear to me what is wrong with the test. "Do a bunch of IO including a sync, completely filling the backing store, then grow the backing store" is surely a real-world scenario that dm-thinp might encounter, no? Things do recover, subsequent writes succeed, and when all is said and done the filesystem is checked and it's still consistent. That sounds like some version of success to me. Even if it's emulating a dumb user/administrator, that doesn't seem invalid. What is wrong here? -Eric