On 01/15/2015 12:51 PM, Dan Williams wrote:
On Thu, Jan 15, 2015 at 11:15 AM, Jens Axboe <axboe@xxxxxx> wrote:
On 01/15/2015 11:59 AM, Dan Williams wrote:
I still don't understand what we get by adding this new allocator
besides complexity, am I missing something?
Two things:
- libata tag allocator sucks. Like seriously sucks, it's almost a worst case
implementation.
Not questioning its suckiness, but I thought the SATA suckiness made
it moot. Apparently not in all cases...
The laptop I'm typing this from does 145K 4k random read IOPS, it's
definitely into the area of it mattering.
- Much better to have a single unified allocator to tweak and tune, than
having separate version.
#2 is still lacking a bit, but I don't think it'd be impossible to unify it
all.
https://bugzilla.kernel.org/show_bug.cgi?id=87101 has gone silent, I
need to ping it. That's my primary concern with the current proposal,
supporting controllers that have weird/unnatural relationships with
the value of the tag.
Unfortunately parts of SATA is as crappy as USB when it comes to things
like that. I can understand why some controllers would like to see a
natural ordering of the tags (even if it is stupid to require, but AHCI
doesn't help there), but it makes very little sense why it would break
others. Looks like this particular case was likely a different bug, the
ordering just made it show up more easily.
And speaking of strict ordering, the blk-mq tagging should actually
improve ordering. The libata implementation orders globally, but that'll
equally break down on multiple processes accessing the device. For that
case, you end up interleaving, and if the drive does strict by-tag
ordering of what IO to do, it'll go random pretty quickly. The blk-mq
implementation preserves ordering between threads in that case, due to
how the last tag is cached. So I would expect to see an improvement in
behavior with that for use cases that offload IO to thread pools (like
posix aio, or private implementations in programs).
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html