On 6/20/24 14:53, John Garry wrote:
From: Alan Adamson <alan.adamson@xxxxxxxxxx> Add support to set block layer request_queue atomic write limits. The limits will be derived from either the namespace or controller atomic parameters. NVMe atomic-related parameters are grouped into "normal" and "power-fail" (or PF) class of parameter. For atomic write support, only PF parameters are of interest. The "normal" parameters are concerned with racing reads and writes (which also applies to PF). See NVM Command Set Specification Revision 1.0d section 2.1.4 for reference. Whether to use per namespace or controller atomic parameters is decided by NSFEAT bit 1 - see Figure 97: Identify – Identify Namespace Data Structure, NVM Command Set. NVMe namespaces may define an atomic boundary, whereby no atomic guarantees are provided for a write which straddles this per-lba space boundary. The block layer merging policy is such that no merges may occur in which the resultant request would straddle such a boundary. Unlike SCSI, NVMe specifies no granularity or alignment rules, apart from atomic boundary rule. In addition, again unlike SCSI, there is no dedicated atomic write command - a write which adheres to the atomic size limit and boundary is implicitly atomic. If NSFEAT bit 1 is set, the following parameters are of interest: - NAWUPF (Namespace Atomic Write Unit Power Fail) - NABSPF (Namespace Atomic Boundary Size Power Fail) - NABO (Namespace Atomic Boundary Offset) and we set request_queue limits as follows: - atomic_write_unit_max = rounddown_pow_of_two(NAWUPF) - atomic_write_max_bytes = NAWUPF - atomic_write_boundary = NABSPF If in the unlikely scenario that NABO is non-zero, then atomic writes will not be supported at all as dealing with this adds extra complexity. This policy may change in future. In all cases, atomic_write_unit_min is set to the logical block size. If NSFEAT bit 1 is unset, the following parameter is of interest: - AWUPF (Atomic Write Unit Power Fail) and we set request_queue limits as follows: - atomic_write_unit_max = rounddown_pow_of_two(AWUPF) - atomic_write_max_bytes = AWUPF - atomic_write_boundary = 0 A new function, nvme_valid_atomic_write(), is also called from submission path to verify that a request has been submitted to the driver will actually be executed atomically. As mentioned, there is no dedicated NVMe atomic write command (which may error for a command which exceeds the controller atomic write limits). Note on NABSPF: There seems to be some vagueness in the spec as to whether NABSPF applies for NSFEAT bit 1 being unset. Figure 97 does not explicitly mention NABSPF and how it is affected by bit 1. However Figure 4 does tell to check Figure 97 for info about per-namespace parameters, which NABSPF is, so it is implied. However currently nvme_update_disk_info() does check namespace parameter NABO regardless of this bit. Signed-off-by: Alan Adamson <alan.adamson@xxxxxxxxxx> Reviewed-by: Keith Busch <kbusch@xxxxxxxxxx> Reviewed-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx> jpg: total rewrite Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx> --- drivers/nvme/host/core.c | 52 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+)
Reviewed-by: Hannes Reinecke <hare@xxxxxxx> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich