Re: [PATCH v15 00/19] Improve write performance for zoned UFS devices

Bart Van Assche <bvanassche@xxxxxxx> · Mon, 27 Nov 2023 11:35:48 -0800

On 11/26/23 23:09, Christoph Hellwig wrote:
I still think it is a very bad idea to add this amount of complexity to
the SCSI code, for a model that can't work for the general case and
diverges from the established NVMe model.

Hi Christoph,

Here is some additional background information:
* UFS vendors prefer the SCSI command set because they combine it with the
  M-PHY transport layer. This combination is more power efficient than NVMe
  over PCIe. According to the information I have available power consumption
  in the M-PHY hibernation state is lower than in the PCIe L2 state. I have
  not yet heard about any attempts to combine the NVMe command set with the
  M-PHY transport layer. Even if this would be possible, it would fragment
  the mobile storage market. This would increase the price of mobile storage
  devices which is undesirable.
* I think that the "established NVMe model" in your email refers to the NVMe
  zone append command. As you know there is no zone append in the SCSI ZBC
  standard.
* Using the software implementation of REQ_OP_ZONE_APPEND in drivers/scsi/sd_zbc.c
  is not an option. REQ_OP_ZONE_APPEND commands are serialized by that
  implementation. This serialization is unavoidable because a SCSI device
  may respond with a unit attention condition to any SCSI command. Hence,
  even if REQ_OP_ZONE_APPEND commands are submitted in order, these may be
  executed out-of-order. We do not want any serialization of SCSI commands
  because this has a significant negative performance impact on IOPS for UFS
  devices. The latest UFS devices support more than 300 K IOPS.
* Serialization in the I/O scheduler of zoned writes also reduces IOPS more
  than what is acceptable.

Hence the approach of this patch series to support pipelining of zoned writes
even if no I/O scheduler has been configured.

I think the amount of complexity introduced by this patch series in the SCSI
core is reasonable. No new states are introduced in the SCSI core. A single
call to a function that reorders pending SCSI commands is introduced in the
SCSI error handler (scsi_call_prepare_resubmit()).

Thanks,

Bart.