On Tue, Jan 12, 2021 at 5:31 AM John Garry <john.garry@xxxxxxxxxx> wrote: > > >> > >> For this case, it seems the opposite - less is more. And I seem to > >> be hitting closer to the sweet spot there, with more merges. > > > > I think cheaper SSDs have a write latency problem due to erase block > > issues. I suspect all SSDs have a channel problem in that there's a > > certain number of parallel channels and once you go over that number > > they can't actually work on any more operations even if they can queue > > them. For cheaper (as in fewer channels, and less spare erased block > > capacity) SSDs there will be a benefit to reducing the depth to some > > multiplier of the channels (I'd guess 2-4 as the multiplier). When > > SSDs become write throttled, there may be less benefit to us queueing > > in the block layer (merging produces bigger packets with lower > > overhead, but the erase block consumption will remain the same). > > > > For the record, the internet thinks that cheap SSDs have 2-4 channels, > > so that would argue a tag depth somewhere from 4-16 > > I have seen upto 10-channel devices mentioned being "high end" - this > would mean upto 40 queue depth using on 4x multiplier; so, based on > that, the current value of 254 for that driver seems way off. > > > > >>> SSDs have a peculiar lifetime problem in that when they get > >>> erase block starved they start behaving more like spinning rust in > >>> that they reach a processing limit but only for writes, so lowering > >>> the write queue depth (which we don't even have a knob for) might > >>> be a good solution. Trying to track the erase block problem has > >>> been a constant bugbear. > >> > >> I am only doing read performance test here, and the disks are SAS3.0 > >> SSDs HUSMM1640ASS204, so not exactly slow. > > > > Possibly ... the stats on most manufacturer SSDs don't give you > > information about the channels or spare erase blocks. > > For my particular disk, this is the datasheet/manual: > https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-sas-series/data-sheet-ultrastar-ssd1600ms.pdf > > https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-sas-series/product-manual-ultrastar-ssd1600mr-1-92tb.pdf > > And I didn't see explicit info regarding channels or spare erase blocks, > as you expect. > John, In that datasheet, I see the model number designator "MR", which stands for "Multi level cell, read-intensive (2 drive writes per day)". Compare that to the 1600MM drive: "Multi level cell, mainstream endurance (10 drive writes per day)": https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-sas-series/data-sheet-ultrastar-ssd1600mm.pdf Also, note that the quoted "Write IOPS (max IOPS, random 4k)" on the datasheet is 30,000 for the 1600MR drive, and 100,000 IOPS for the 1600MM drive. Thanks, Bryan > > > >>> I'm assuming you're using spinning rust in the above, so it sounds > >>> like the firmware in the card might be eating the queue full > >>> returns. Icould see this happening in RAID mode, but it shouldn't > >>> happen in jbod mode. > >> > >> Not sure on that, but I didn't check too much. I did try to increase > >> fio queue depth and sdev queue depth to be very large to clobber the > >> disks, but still nothing. > > > > If it's an SSD it's likely not giving the queue full you'd need to get > > the mid-layer to throttle automatically. > > > > So it seems that the queue depth we select should depend on class of > device, but then the value can also affect write performance. > > As for my issue today, I can propose a smaller value for the mpt3sas > driver based on my limited tests, and see how the driver maintainers > feel about it. > > I just wonder what intelligence we can add for this. And whether LLDDs > should be selecting this (queue depth) at all, unless they (the HBA) > have some limits themselves. > > You did mention maybe a separate write queue depth - could this be a > solution? > > Thanks, > John >