Hi Bart, Thanks for pointing this out. Yes, the purpose of my patch is exactly same as Ming's patch you referred to, albeit it achieves the same purpose in a different way. If the earlier patch makes it upstream, then my patch is not needed. Thanks, Sumanesh -----Original Message----- From: Bart Van Assche [mailto:bvanassche@xxxxxxx] Sent: Tuesday, November 19, 2019 4:22 PM To: Sumanesh Samanta; axboe@xxxxxxxxx; linux-block@xxxxxxxxxxxxxxx; jejb@xxxxxxxxxxxxx; martin.petersen@xxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx; ming.lei@xxxxxxxxxx; sathya.prakash@xxxxxxxxxxxx; chaitra.basappa@xxxxxxxxxxxx; suganath-prabu.subramani@xxxxxxxxxxxx; kashyap.desai@xxxxxxxxxxxx; sumit.saxena@xxxxxxxxxxxx; shivasharan.srikanteshwara@xxxxxxxxxxxx; emilne@xxxxxxxxxx; hch@xxxxxx; hare@xxxxxxx; bart.vanassche@xxxxxxx Subject: Re: [PATCH 1/1] scsi core: limit overhead of device_busy counter for SSDs On 11/19/19 12:07 PM, Sumanesh Samanta wrote: > From: root <sumanesh.samanta@xxxxxxxxxxxx> > > Recently a patch was delivered to remove host_busy counter from SCSI mid > layer. That was a major bottleneck, and helped improve SCSI stack > performance. > With that patch, bottle neck moved to the scsi_device device_busy counter. > The performance issue with this counter is seen more in cases where a > single device can produce very high IOPs, for example h/w RAID devices > where OS sees one device, but there are many drives behind it, thus being > capable of very high IOPs. The effect is also visible when cores from > multiple NUMA nodes send IO to the same device or same controller. > The device_busy counter is not needed by controllers which can manage as > many IO as submitted to it. Rotating media still uses it for merging IO, > but for non-rotating SSD drives it becomes a major bottleneck as described > above. > > A few weeks back, a patch was provided to address the device_busy counter > also but unfortunately that had some issues: > 1. There was a functional issue discovered: > https://lists.01.org/hyperkitty/list/lkp@xxxxxxxxxxxx/thread/VFKDTG4XC4VHWX5KKDJJI7P36EIGK526/ > 2. There was some concern about existing drivers using the device_busy > counter. > > This patch is an attempt to address both the above issues. > For this patch to be effective, LLDs need to set a specific flag > use_per_cpu_device_busy in the scsi_host_template. For other drivers ( who > does not set the flag), this patch would be a no-op, and should not affect > their performance or functionality at all. > > Also, this patch does not fundamentally change any logic or functionality > of the code. All it does is replace device_busy with a per CPU counter. In > fast path, all cpu increment/decrement their own counter. In relatively > slow path. they call scsi_device_busy function to get the total no of IO > outstanding on a device. Only functional aspect it changes is that for > non-rotating media, the number of IO to a device is not restricted. > Controllers which can handle that, can set the use_per_cpu_device_busy > flag in scsi_host_template to take advantage of this patch. Other > controllers need not modify any code and would work as usual. > Since the patch does not modify any other functional aspects, it should > not have any side effects even for drivers that do set the > use_per_cpu_device_busy flag. Hi Sumanesh, Can you have a look at the following patch series and see whether it has perhaps the same purpose as your patch? https://lore.kernel.org/linux-scsi/20191118103117.978-1-ming.lei@xxxxxxxxxx/ Thanks, Bart.