Re: [PATCH 1/2] io_u: fix offset generation for mix of random and sequential workload

Vincent Fu <vincentfu@xxxxxxxxx> · Wed, 30 Nov 2022 14:10:48 -0500

On 11/30/22 05:50, Ankit Kumar wrote:
On Mon, Nov 28, 2022 at 04:57:28PM -0500, Vincent Fu wrote:
On 11/18/22 00:14, Ankit Kumar wrote:
Sometimes while running a mix of sequential and random I/O's the
generated sequential offset is outside the I/O region. This usually
happens if the previous I/O is the last block in the region and
thus results in this sequential I/O's offset to be just outside the
region.
With this fix, fio generates a random offset within the I/O region.

This fixes #1486

Signed-off-by: Ankit Kumar <ankit.kumar@xxxxxxxxxxx>
---
   io_u.c | 5 ++++-
   1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/io_u.c b/io_u.c
index 8035f4b7..e49b1b29 100644
--- a/io_u.c
+++ b/io_u.c
@@ -432,8 +432,11 @@ static int get_next_block(struct thread_data *td,
struct io_u *io_u,
   				*is_random = false;
   				io_u_set(td, io_u, IO_U_F_BUSY_OK);
   				ret = get_next_seq_offset(td, f, ddir, &offset);
-				if (ret)
+				if (ret || offset >= f->io_size) {
   					ret = get_next_rand_block(td, f, ddir, &b);
+					offset = -1ULL;
+					*is_random = true;
+				}
   			}
   		} else {
   			*is_random = false;

This patch causes a regression with a workload such as:

fio --name=test --bs=4k --filesize=32k --nrfiles=10 --number_ios=10
--file_service_type=sequential --percentage_random=50 --rw=randread
--randrepeat=0 --ioengine=null --debug=io

Only the first file will be accessed when this patch is applied. Bad offsets
are needed in order to trigger the move to the next file.

Here is a related issue: https://protect2.fireeye.com/v1/url?k=9f1fdb0c-fe94ce3a-9f1e5043-74fe485fffe0-0f0dbb3b9fc8b673&q=1&e=c35fdef7-d5f7-487d-b2e8-9f36779622fe&u=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fissues%2F1372

Perhaps the right thing to do is to limit the offset >= f->io_size check to
cases where nrfiles = 1 as in the patch for issue 1372.

Also, it seems to me that if we are intending to generate a sequential
offset for this case and succeed in doing so, we should just wrap around to
the beginning of the file instead of choosing a new random offset.

In general we have an overwhelming set of ways to generate offsets and
little automated testing for offset generation. In the long run we should
develop a set of tests for offset generation that exercise the different
options. This will help us detect issues and give us confidence that future
code changes won't introduce regressions.

Vincent

Ok thanks for pointing out, adding the nrfiles == 1 check should suffice.

I thought about changing the current sequential offset generation logic but
its used for a lot of things as you pointed out. Modifying it may create
further regression.

One such case is with zone block devices when zone capacity is less than
zone size. Currently if we don't limit size by specifying io_size or
number_ios or runtime, fio will exit early without transferring size bytes
of data as it generates a bad offset i.e. offset outside the I/O region.
This is because of holes or gaps between a zone end and start of next zone.
All the test cases in t/zbd/test-zbd-support takes account of that.

There is another case with gaps ex: with workload such as "rw=read:4K"
where fio will exit early without transferring size bytes of data. I am
not sure whether this is the correct behavior.
This will only happen if file size is bigger than f->offset + f->io_size
i.e. we never reach the end of file, else it will wrap around.

I think to handle all these cases and percentage_random workload, for
multiple files is going to be tricky.

One solution I can think of is to just update the documentation for "size" by
mentioning that runtime can also get reduced if there are holes or gaps
while doing sequential I/O's, or if we have mixture of sequential or random
workload which can generate an out of I/O region offset.

Ankit


Ok. It seems like what is happening in 
https://github.com/axboe/fio/issues/1486 is that fio is doing sequential 
I/O, reaches the end of the device, and then decides that the job is 
completed. This is arguably correct behavior. I agree that the right 
approach is to improve the documentation because changing the code to 
catch this special case may be risky. I think the io_size work around 
you provided in the ticket is a reasonable resolution.

Vincent