For this case, it seems the opposite - less is more. And I seem to
be hitting closer to the sweet spot there, with more merges.
I think cheaper SSDs have a write latency problem due to erase block
issues. I suspect all SSDs have a channel problem in that there's a
certain number of parallel channels and once you go over that number
they can't actually work on any more operations even if they can queue
them. For cheaper (as in fewer channels, and less spare erased block
capacity) SSDs there will be a benefit to reducing the depth to some
multiplier of the channels (I'd guess 2-4 as the multiplier). When
SSDs become write throttled, there may be less benefit to us queueing
in the block layer (merging produces bigger packets with lower
overhead, but the erase block consumption will remain the same).
For the record, the internet thinks that cheap SSDs have 2-4 channels,
so that would argue a tag depth somewhere from 4-16
I have seen upto 10-channel devices mentioned being "high end" - this
would mean upto 40 queue depth using on 4x multiplier; so, based on
that, the current value of 254 for that driver seems way off.
SSDs have a peculiar lifetime problem in that when they get
erase block starved they start behaving more like spinning rust in
that they reach a processing limit but only for writes, so lowering
the write queue depth (which we don't even have a knob for) might
be a good solution. Trying to track the erase block problem has
been a constant bugbear.
I am only doing read performance test here, and the disks are SAS3.0
SSDs HUSMM1640ASS204, so not exactly slow.
Possibly ... the stats on most manufacturer SSDs don't give you
information about the channels or spare erase blocks.
For my particular disk, this is the datasheet/manual:
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-sas-series/data-sheet-ultrastar-ssd1600ms.pdf
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-sas-series/product-manual-ultrastar-ssd1600mr-1-92tb.pdf
And I didn't see explicit info regarding channels or spare erase blocks,
as you expect.
I'm assuming you're using spinning rust in the above, so it sounds
like the firmware in the card might be eating the queue full
returns. Icould see this happening in RAID mode, but it shouldn't
happen in jbod mode.
Not sure on that, but I didn't check too much. I did try to increase
fio queue depth and sdev queue depth to be very large to clobber the
disks, but still nothing.
If it's an SSD it's likely not giving the queue full you'd need to get
the mid-layer to throttle automatically.
So it seems that the queue depth we select should depend on class of
device, but then the value can also affect write performance.
As for my issue today, I can propose a smaller value for the mpt3sas
driver based on my limited tests, and see how the driver maintainers
feel about it.
I just wonder what intelligence we can add for this. And whether LLDDs
should be selecting this (queue depth) at all, unless they (the HBA)
have some limits themselves.
You did mention maybe a separate write queue depth - could this be a
solution?
Thanks,
John