Re: About scsi device queue depth

Bryan Gurney <bgurney@xxxxxxxxxx> · Tue, 12 Jan 2021 11:40:15 -0500

On Tue, Jan 12, 2021 at 5:31 AM John Garry <john.garry@xxxxxxxxxx> wrote:
>
> >>
> >> For this case, it seems the opposite - less is more. And I seem to
> >> be hitting closer to the sweet spot there, with more merges.
> >
> > I think cheaper SSDs have a write latency problem due to erase block
> > issues.  I suspect all SSDs have a channel problem in that there's a
> > certain number of parallel channels and once you go over that number
> > they can't actually work on any more operations even if they can queue
> > them.  For cheaper (as in fewer channels, and less spare erased block
> > capacity) SSDs there will be a benefit to reducing the depth to some
> > multiplier of the channels (I'd guess 2-4 as the multiplier).  When
> > SSDs become write throttled, there may be less benefit to us queueing
> > in the block layer (merging produces bigger packets with lower
> > overhead, but the erase block consumption will remain the same).
> >
> > For the record, the internet thinks that cheap SSDs have 2-4 channels,
> > so that would argue a tag depth somewhere from 4-16
>
> I have seen upto 10-channel devices mentioned being "high end" - this
> would mean upto 40 queue depth using on 4x multiplier; so, based on
> that, the current value of 254 for that driver seems way off.
>
> >
> >>> SSDs have a peculiar lifetime problem in that when they get
> >>> erase block starved they start behaving more like spinning rust in
> >>> that they reach a processing limit but only for writes, so lowering
> >>> the write queue depth (which we don't even have a knob for) might
> >>> be a good solution.  Trying to track the erase block problem has
> >>> been a constant bugbear.
> >>
> >> I am only doing read performance test here, and the disks are SAS3.0
> >> SSDs HUSMM1640ASS204, so not exactly slow.
> >
> > Possibly ... the stats on most manufacturer SSDs don't give you
> > information about the channels or spare erase blocks.
>
> For my particular disk, this is the datasheet/manual:
> https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-sas-series/data-sheet-ultrastar-ssd1600ms.pdf
>
> https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-sas-series/product-manual-ultrastar-ssd1600mr-1-92tb.pdf
>
> And I didn't see explicit info regarding channels or spare erase blocks,
> as you expect.
>

John,

In that datasheet, I see the model number designator "MR", which
stands for "Multi level cell, read-intensive (2 drive writes per
day)".

Compare that to the 1600MM drive: "Multi level cell, mainstream
endurance (10 drive writes per day)":
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-sas-series/data-sheet-ultrastar-ssd1600mm.pdf

Also, note that the quoted "Write IOPS (max IOPS, random 4k)" on the
datasheet is 30,000 for the 1600MR drive, and 100,000 IOPS for the
1600MM drive.

Thanks,

Bryan

> >
> >>> I'm assuming you're using spinning rust in the above, so it sounds
> >>> like the firmware in the card might be eating the queue full
> >>> returns.  Icould see this happening in RAID mode, but it shouldn't
> >>> happen in jbod mode.
> >>
> >> Not sure on that, but I didn't check too much. I did try to increase
> >> fio queue depth and sdev queue depth to be very large to clobber the
> >> disks, but still nothing.
> >
> > If it's an SSD it's likely not giving the queue full you'd need to get
> > the mid-layer to throttle automatically.
> >
>
> So it seems that the queue depth we select should depend on class of
> device, but then the value can also affect write performance.
>
> As for my issue today, I can propose a smaller value for the mpt3sas
> driver based on my limited tests, and see how the driver maintainers
> feel about it.
>
> I just wonder what intelligence we can add for this. And whether LLDDs
> should be selecting this (queue depth) at all, unless they (the HBA)
> have some limits themselves.
>
> You did mention maybe a separate write queue depth - could this be a
> solution?
>
> Thanks,
> John
>