I haven't taken a close look at the code yet so far, but one quick note
that patches like this should be against the branches for 5.11. In fact,
this one doesn't even compile against current -git, as
blk_mq_bio_list_merge is now called blk_bio_list_merge.
Ugh, I guess that Jaehyun had this patch bottled up and didn't rebase
before submitting.. Sorry about that.
In any case, I did run this through some quick peak testing as I was
curious, and I'm seeing about 20% drop in peak IOPS over none running
this. Perf diff:
10.71% -2.44% [kernel.vmlinux] [k] read_tsc
2.33% -1.99% [kernel.vmlinux] [k] _raw_spin_lock
You ran this with nvme? or null_blk? I guess neither would benefit
from this because if the underlying device will not benefit from
batching (at least enough for the extra cost of accounting for it) it
will be counter productive to use this scheduler.
This is nvme, actual device. The initial posting could be a bit more
explicit on the use case, it says:
"For NVMe SSDs, the i10 I/O scheduler achieves ~60% improvements in
terms of IOPS per core over "noop" I/O scheduler."
which made me very skeptical, as it sounds like it's raw device claims.
You are absolutely right, that needs to be fixed.
Does beg the question of why this is a new scheduler then. It's pretty
basic stuff, something that could trivially just be added a side effect
of the core (and in fact we have much of it already). Doesn't really seem
to warrant a new scheduler at all. There isn't really much in there.
Not saying it absolutely warrants a new one, and it could I guess sit in
the core, but this attempts to optimize for a specific metric while
trading-off others, which is exactly what I/O schedulers are for,
optimizing for a specific metric.
Not sure we want to build something biases towards throughput on the
expense of latency into the block core. And, as mentioned this is not
well suited to all device types...
But if you think this has a better home, I'm assuming that the guys
will be open to that.
Was curious and wanted to look it up, but it doesn't exist.
I think this is the right one:
https://github.com/i10-kernel/upstream-linux/blob/master/i10-evaluation.pdf
We had some back and forth around the naming, hence this was probably
omitted.
That works, my local results were a bit worse than listed in there though.
And what does this mean:
"We note that Linux I/O scheduler introduces an additional kernel worker
thread at the I/O dispatching stage"
It most certainly does not for the common/hot case.
Yes I agree, didn't see the local results. Probably some
misunderstanding or a typo, I'll let them reply on this.