On 9/15/21 4:42 PM, Jens Axboe wrote: > On 9/15/21 1:40 PM, Jens Axboe wrote: >> On 9/15/21 1:26 PM, Linus Torvalds wrote: >>> On Wed, Sep 15, 2021 at 11:46 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>> >>>> The usual tests >>>> do end up hitting the -EAGAIN path quite easily for certain device >>>> types, but not the short read/write. >>> >>> No way to do something like "read in file to make sure it's cached, >>> then invalidate caches from position X with POSIX_FADV_DONTNEED, then >>> do a read that crosses that cached/uncached boundary"? >>> >>> To at least verify that "partly synchronous, but partly punted to >>> async" case? >>> >>> Or were you talking about some other situation? >> >> No that covers some of it, and that happens naturally with buffered IO. >> The typical case is -EAGAIN on the first try, then you get a partial >> or all of it the next loop, and then done or continue. I tend to run >> fio verification workloads for that, as you get all the flexibility >> of fio with the data verification. And there are tests in there that run >> DONTNEED in parallel with buffered IO, exactly to catch some of these >> csaes. But they don't verify the data, generally. >> >> In that sense buffered is a lot easier than O_DIRECT, as it's easier to >> provoke these cases. And that does hit all the save/restore parts and >> looping, and if you do it with registered buffers then you get to work >> with bvec iter as well. O_DIRECT may get you -EAGAIN for low queue depth >> devices, but it'll never do a short read/write after that. >> >> But that's not in the regressions tests. I'll write a test case >> that can go with the liburing regressions for it. > > OK I wrote one, quick'n dirty. It's written as a liburing test, which > means it can take no arguments (in which case it creates a 128MB file), > or it can take an argument and it'll use that argument as the file. We > fill the first 128MB of the file with known data, basically the offset > of the file. Then we read it back in any of the following ways: > > 1) Using non-vectored read > 2) Using vectored read, segments that fit in UIO_FASTIOV > 3) Using vectored read, segments larger than UIO_FASTIOV > > This catches all the different cases for a read. > > We do that with both buffered and O_DIRECT, and before each pass, we > randomly DONTNEED either the first, middle, or end part of each segment > in the read size. > > I ran this on my laptop, and I found this: > axboe@p1 ~/gi/liburing (master)> test/file-verify 0.100s > bad read 229376, read 3 > Buffered novec test failed > axboe@p1 ~/gi/liburing (master)> test/file-verify 0.213s > bad read 294912, read 0 > Buffered novec test failed > > which is because I'm running the iov_iter.2 stuff, and we're hitting > that double accounting issue that I mentioned in the cover letter for > this series. That's why the read return is larger than we ask for > (128K). Running it on the current branch passes: > > [root@archlinux liburing]# for i in $(seq 10); do test/file-verify; done > [root@archlinux liburing]# > > (this is in my test vm that I run on the laptop for kernel testing, > hence the root and different hostname). > > I will add this as a liburing regression test case. Probably needs a bit > of cleaning up first, it was just a quick prototype as I thought your > suggestion was a good one. Will probably change it to run at a higher > queue depth than just the 1 it does now. Cleaned it up a bit, and added registered buffer support as well (which is another variant over non-vectored reads) and queued IO support as well: https://git.kernel.dk/cgit/liburing/commit/?id=6ab387dab745aff2af760d9fed56a4154669edec and it's now part of the regular testing. Here's my usual run: Running test file-verify 3 sec Running test file-verify /dev/nvme0n1p2 3 sec Running test file-verify /dev/nvme1n1p1 3 sec Running test file-verify /dev/sdc2 Test file-verify timed out (may not be a failure) Running test file-verify /dev/dm-0 3 sec Running test file-verify /data/file 3 sec Note that the sdc2 timeout isn't a failure, it's just that emulation on qemu is slow enough that it takes 1min20s to run and I time out tests after 60s in the harness to prevent something stalling forever. -- Jens Axboe