Hmm, I see a few choices for trying to solve alignment of the offset and forced alignment of sequential blocks: * Introduce an "offset_align" option solely for aligning the "initial" offset and stop trying to use blockalign in the offset. Document it well to stop confusion with blockalign. * Document blockalign as ONLY working for random I/O or * Do blockalign for sequential I/O on a per I/O basis (potentially creating gappy I/O). This is also the request from https://github.com/axboe/fio/issues/341 . Technically this is has a big overlap with zoned I/O and the existing "generated offset" used in rw option but perhaps it's just different enough to be justified? Thoughts? On 21 October 2017 at 01:13, Jeff Furlong <jeff.furlong@xxxxxxx> wrote: > Yes, the bits around line 845 show the min() being used. It seems get_start_offset() is used for the first IO and subsequent IO's, making things difficult. I believe the correct fix here would be to set the ba specific to the io type. The ba parameter allows for read/write/trim alignments. So if io_u->ddir==DDIR_READ then we could just align to o->ba[DDIR_READ]. But I don't see how we would access io_u at this point, we don't know of the potential io_u at this time? It could be a mixed read/write/trim. > > Lacking that, brute force would suggest: > > if (fio_option_is_set(o, ba)) { > align_bs = (unsigned long long) o->ba[DDIR_READ]; > if(o->ba[DDIR_READ] != o->ba[DDIR_WRITE]) > align_bs = (unsigned long long) o->ba[DDIR_READ] * (unsigned long long) o->ba[DDIR_WRITE]; > if(align_bs != (unsigned long long) o->ba[DDIR_TRIM]) > align_bs = align_bs * (unsigned long long) o->ba[DDIR_TRIM]; > > But I see another problem in that o->ba[DDIR_READ] is not set for sequential workloads. In fixup_options() we have: > > if (!o->ba[DDIR_READ] || !td_random(td)) > o->ba[DDIR_READ] = o->min_bs[DDIR_READ]; > if (!o->ba[DDIR_WRITE] || !td_random(td)) > o->ba[DDIR_WRITE] = o->min_bs[DDIR_WRITE]; > if (!o->ba[DDIR_TRIM] || !td_random(td)) > o->ba[DDIR_TRIM] = o->min_bs[DDIR_TRIM]; > > I don't follow that code, other than if sequential, set ba to min_bs? If we remove that code and use above change, we can get the starting LBA to be aligned to the ba in the case of > > fio --name=test_job --ioengine=libaio --direct=1 --rw=read --iodepth=1 --size=100% --bs=4k --filename=/dev/nvme1n1 --number_ios=8 --offset=50% --log_offset=1 --write_iops_log=test_job --ba=8k > > # cat test_job_iops.1.log > 0, 1, 0, 4096, 1600315899904 > > It seems to be a hack, so I didn't create a patch for it. Would like to better understand what I'm missing before breaking something. > > Thanks. > > Regards, > Jeff > > > -----Original Message----- > From: Sitsofe Wheeler [mailto:sitsofe@xxxxxxxxx] > Sent: Friday, October 20, 2017 2:01 PM > To: Jeff Furlong <jeff.furlong@xxxxxxx> > Cc: fio@xxxxxxxxxxxxxxx > Subject: Re: fio offset with ba > > Hi, > > On 20 October 2017 at 20:08, Jeff Furlong <jeff.furlong@xxxxxxx> wrote: >> >> I don't quite follow the logic in the calculate offset function. The offset parameter recently allows a percentage. Suppose we set it to 50% and want to block align the IO's starting at 50% of device capacity, then block aligned to 8KB. >> >> # fio -version >> fio-3.1-60-g71aa >> >> # blockdev --getsize64 /dev/nvme1n1 >> 3200631791616 >> >> # fio --name=test_job --ioengine=libaio --direct=1 --rw=read >> --iodepth=1 --size=100% --bs=4k --filename=/dev/nvme1n1 --runtime=1s >> --offset=50% --log_offset=1 --write_iops_log=test_job --ba=8k >> >> # cat test_job_iops.1.log >> 0, 1, 0, 4096, 1600315895808 >> 0, 1, 0, 4096, 1600315899904 >> 0, 1, 0, 4096, 1600315904000 >> 0, 1, 0, 4096, 1600315908096 >> >> So we can see the device has 3200631791616 bytes, 50% of which is >> 1600315895808 bytes, which happens to be 4KB aligned, but not 8KB >> aligned. Even though we set the --ba=8k parameter, the offset LBA as >> logged in the iops.1.log shows > > Hmm I see the same problem with this job: > fio --name=test_job --ioengine=null --rw=read --iodepth=1 > --size=3200631791616 --bs=4k --number_ios=1 --offset=50% --ba=8k --debug=io > > [...] > io 15013 fill_io_u: io_u 0x236ad80: > off=1600315895808/len=4096/ddir=0io 15013 /test_job.0.0io > 15013 > > I think your guess about only impacting random I/O is probably right because fio --name=test_job --randrepeat=0 --ioengine=null --rw=randread > --iodepth=1 --size=3200631791616 --bs=4k --number_ios=1 --offset=50% --ba=8k --debug=io > > picks offsets that are 8k aligned. > >> 4KB alignment. Does --ba work for all IO's or only random IO's? If all, does get_start_offset() control the raw offset value? I don't see why the min(ba, bs) is used in the calculation, but perhaps I am missing something. Thanks. > > Where is min(ba, bs) done - do you mean the bits around https://github.com/axboe/fio/commit/89978a6b26f81bdbd63228e2e2a86f604ee46c56#diff-4abbf037246dd2e450dc3f6a2ac77180R845? > I agree you probably want to take the maximum of all the block alignments but what if one of the smaller ones is not a multiple of the largest one? > > Would you like to propose a patch? > > -- > Sitsofe | http://sucs.org/~sits/ -- Sitsofe | http://sucs.org/~sits/ -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html