On 3/24/23 3:14?PM, Linus Torvalds wrote: > On Fri, Mar 24, 2023 at 1:44?PM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> We've been doing a few conversions of ITER_IOVEC to ITER_UBUF in select >> spots, as the latter is cheaper to iterate and hence saves some cycles. >> I recently experimented [1] with io_uring converting single segment READV >> and WRITEV into non-vectored variants, as we can save some cycles through >> that as well. >> >> But there's really no reason why we can't just do this further down, >> enabling it for everyone. It's quite common to use vectored reads or >> writes even with a single segment, unfortunately, even for cases where >> there's no specific reason to do so. From a bit of non-scientific >> testing on a vm on my laptop, I see about 60% of the import_iovec() >> calls being for a single segment. > > I obviously think this is the RightThing(tm) to do, but it's probably > too late for 6.3 since there is the worry that somebody "knows" that > it's a IOVEC somewhere. > > Even if it sounds unlikely, and wrong. Agree, wasn't really targeting 6.3 though after looking over it, I do feel better about the whole thing. I already ran the io_uring test and it showed a nice win, wrote a small micro benchmark that just does 10M 4k reads from /dev/zero. First observation from the below numbers is that copying just a single vec is EXPENSIVE. But I already knew that from the io_uring testing, where we're spending ~8% just on that alone. Secondly, readv(..., 1) saves about 3% with the patches in this series. read-zero takes on argument, which is to do vectored reads or not. Stock kernel: axboe@r7525 ~> time taskset -c 0 ./read-zero 0 ________________________________________________________ Executed in 859.98 millis fish external usr time 210.10 millis 291.00 micros 209.81 millis sys time 649.42 millis 0.00 micros 649.42 millis axboe@r7525 ~> time taskset -c 0 ./read-zero 0 ________________________________________________________ Executed in 853.82 millis fish external usr time 228.45 millis 304.00 micros 228.15 millis sys time 624.92 millis 0.00 micros 624.92 millis axboe@r7525 ~> time taskset -c 0 ./read-zero 1 ________________________________________________________ Executed in 1.84 secs fish external usr time 0.21 secs 218.00 micros 0.21 secs sys time 1.63 secs 101.00 micros 1.63 secs axboe@r7525 ~> time taskset -c 0 ./read-zero 1 ________________________________________________________ Executed in 1.83 secs fish external usr time 0.18 secs 594.00 micros 0.18 secs sys time 1.64 secs 0.00 micros 1.64 secs And with the patches: axboe@r7525 ~> time taskset -c 0 ./read-zero 1 ________________________________________________________ Executed in 1.78 secs fish external usr time 0.22 secs 141.00 micros 0.22 secs sys time 1.56 secs 141.00 micros 1.56 secs axboe@r7525 ~> time taskset -c 0 ./read-zero 1 ________________________________________________________ Executed in 1.78 secs fish external usr time 0.19 secs 0.00 micros 0.19 secs sys time 1.59 secs 509.00 micros 1.59 secs read-zero 0 the same with patches, as expected. > Adding Al, who tends to be the main iovec person. > > Al, see > > https://lore.kernel.org/all/20230324204443.45950-1-axboe@xxxxxxxxx/ > > for the series if you didn't already see it on fsdevel. Yep sorry, forgot to add Al. -- Jens Axboe