Re: [PATCHSET v3 0/3] Add ability to save/restore iov_iter state

Jens Axboe <axboe@xxxxxxxxx> · Wed, 15 Sep 2021 19:15:55 -0600

On 9/15/21 4:42 PM, Jens Axboe wrote:
> On 9/15/21 1:40 PM, Jens Axboe wrote:
>> On 9/15/21 1:26 PM, Linus Torvalds wrote:
>>> On Wed, Sep 15, 2021 at 11:46 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>
>>>>    The usual tests
>>>> do end up hitting the -EAGAIN path quite easily for certain device
>>>> types, but not the short read/write.
>>>
>>> No way to do something like "read in file to make sure it's cached,
>>> then invalidate caches from position X with POSIX_FADV_DONTNEED, then
>>> do a read that crosses that cached/uncached boundary"?
>>>
>>> To at least verify that "partly synchronous, but partly punted to
>>> async" case?
>>>
>>> Or were you talking about some other situation?
>>
>> No that covers some of it, and that happens naturally with buffered IO.
>> The typical case is -EAGAIN on the first try, then you get a partial
>> or all of it the next loop, and then done or continue. I tend to run
>> fio verification workloads for that, as you get all the flexibility
>> of fio with the data verification. And there are tests in there that run
>> DONTNEED in parallel with buffered IO, exactly to catch some of these
>> csaes. But they don't verify the data, generally.
>>
>> In that sense buffered is a lot easier than O_DIRECT, as it's easier to
>> provoke these cases. And that does hit all the save/restore parts and
>> looping, and if you do it with registered buffers then you get to work
>> with bvec iter as well. O_DIRECT may get you -EAGAIN for low queue depth
>> devices, but it'll never do a short read/write after that. 
>>
>> But that's not in the regressions tests. I'll write a test case
>> that can go with the liburing regressions for it.
> 
> OK I wrote one, quick'n dirty. It's written as a liburing test, which
> means it can take no arguments (in which case it creates a 128MB file),
> or it can take an argument and it'll use that argument as the file. We
> fill the first 128MB of the file with known data, basically the offset
> of the file. Then we read it back in any of the following ways:
> 
> 1) Using non-vectored read
> 2) Using vectored read, segments that fit in UIO_FASTIOV
> 3) Using vectored read, segments larger than UIO_FASTIOV
> 
> This catches all the different cases for a read.
> 
> We do that with both buffered and O_DIRECT, and before each pass, we
> randomly DONTNEED either the first, middle, or end part of each segment
> in the read size.
> 
> I ran this on my laptop, and I found this:
> axboe@p1 ~/gi/liburing (master)> test/file-verify                                0.100s
> bad read 229376, read 3
> Buffered novec test failed
> axboe@p1 ~/gi/liburing (master)> test/file-verify                                0.213s
> bad read 294912, read 0
> Buffered novec test failed
> 
> which is because I'm running the iov_iter.2 stuff, and we're hitting
> that double accounting issue that I mentioned in the cover letter for
> this series. That's why the read return is larger than we ask for
> (128K). Running it on the current branch passes:
> 
> [root@archlinux liburing]# for i in $(seq 10); do test/file-verify; done
> [root@archlinux liburing]# 
> 
> (this is in my test vm that I run on the laptop for kernel testing,
> hence the root and different hostname).
> 
> I will add this as a liburing regression test case. Probably needs a bit
> of cleaning up first, it was just a quick prototype as I thought your
> suggestion was a good one. Will probably change it to run at a higher
> queue depth than just the 1 it does now.

Cleaned it up a bit, and added registered buffer support as well (which
is another variant over non-vectored reads) and queued IO support as
well:

https://git.kernel.dk/cgit/liburing/commit/?id=6ab387dab745aff2af760d9fed56a4154669edec

and it's now part of the regular testing. Here's my usual run:

Running test file-verify                                            3 sec
Running test file-verify /dev/nvme0n1p2                             3 sec
Running test file-verify /dev/nvme1n1p1                             3 sec
Running test file-verify /dev/sdc2                                  Test file-verify timed out (may not be a failure)
Running test file-verify /dev/dm-0                                  3 sec
Running test file-verify /data/file                                 3 sec

Note that the sdc2 timeout isn't a failure, it's just that emulation on
qemu is slow enough that it takes 1min20s to run and I time out tests
after 60s in the harness to prevent something stalling forever.

-- 
Jens Axboe