On 3/20/24 07:49, Christoph Hellwig wrote: > Can you try this patch instead? > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 00864a63447099..4bac54d4e0015b 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -2204,6 +2204,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info) > } > > if (!ret && nvme_ns_head_multipath(ns->head)) { > + struct queue_limits *ns_lim = &ns->disk->queue->limits; > struct queue_limits lim; > > blk_mq_freeze_queue(ns->head->disk->queue); > @@ -2215,7 +2216,26 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info) > set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info)); > nvme_mpath_revalidate_paths(ns); > > + /* > + * queue_limits mixes values that are the hardware limitations > + * for bio splitting with what is the device configuration. > + * > + * For NVMe the device configuration can change after e.g. a > + * Format command, and we really want to pick up the new format > + * value here. But we must still stack the queue limits to the > + * least common denominator for multipathing to split the bios > + * properly. > + * > + * To work around this, we explicitly set the device > + * configuration to those that we just queried, but only stack > + * the splitting limits in to make sure we still obey possibly > + * lower limitations of other controllers. > + */ > lim = queue_limits_start_update(ns->head->disk->queue); > + lim.logical_block_size = ns_lim->logical_block_size; > + lim.physical_block_size = ns_lim->physical_block_size; > + lim.io_min = ns_lim->io_min; > + lim.io_opt = ns_lim->io_opt; > queue_limits_stack_bdev(&lim, ns->disk->part0, 0, > ns->head->disk->disk_name); > ret = queue_limits_commit_update(ns->head->disk->queue, &lim); > I have just tested the above patch and it's working as expected. With the above patch, I don't see any issue formatting the NVMe disk with block-size of 512. Looks good to me. Thanks, --Nilay PS: For reference, please find below test result obtained using the above patch. -------------------------------------------------------------------------------- # lspci 0018:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X # nvme list Node Generic SN Model Namespace Usage Format FW Rev --------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- -------- /dev/nvme0n1 /dev/ng0n1 S6EUNA0R500358 1.6TB NVMe Gen4 U.2 SSD 0x1 1.60 TB / 1.60 TB 4 KiB + 0 B REV.SN49 # nvme id-ns /dev/nvme0n1 -H NVME Identify Namespace 1: nsze : 0xba4d4ab0 ncap : 0xba4d4ab0 nuse : 0xba4d4ab0 nsfeat : 0 [4:4] : 0 NPWG, NPWA, NPDG, NPDA, and NOWS are Not Supported [3:3] : 0 NGUID and EUI64 fields if non-zero, Reused [2:2] : 0 Deallocated or Unwritten Logical Block error Not Supported [1:1] : 0 Namespace uses AWUN, AWUPF, and ACWU [0:0] : 0 Thin Provisioning Not Supported <snip> <snip> nlbaf : 4 flbas : 0 [6:5] : 0 Most significant 2 bits of Current LBA Format Selected [4:4] : 0 Metadata Transferred in Separate Contiguous Buffer [3:0] : 0 Least significant 4 bits of Current LBA Format Selected <snip> <snip> LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use) LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded # lsblk -t /dev/nvme0n1 NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME nvme0n1 0 4096 0 4096 4096 0 128 0B ^^^ ^^^ << The nvme disk has block size of 4096; now format it with block size of 512 # nvme format /dev/nvme0n1 --lbaf=2 --pil=0 --ms=0 --pi=0 -f Success formatting namespace:1 >> Success formatting; no error seen # lsblk -t /dev/nvme0n1 NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME nvme0n1 0 512 0 512 512 0 128 0B ^^^ ^^^ # cat /sys/block/nvme0n1/queue/logical_block_size:512 # cat /sys/block/nvme0n1/queue/physical_block_size:512 # cat /sys/block/nvme0n1/queue/optimal_io_size:0 # cat /sys/block/nvme0n1/queue/minimum_io_size:512 # cat /sys/block/nvme0c0n1/queue/logical_block_size:512 # cat /sys/block/nvme0c0n1/queue/physical_block_size:512 # cat /sys/block/nvme0c0n1/queue/optimal_io_size:0 # cat /sys/block/nvme0c0n1/queue/minimum_io_size:512 # nvme id-ns /dev/nvme0n1 -H NVME Identify Namespace 1: nsze : 0xba4d4ab0 ncap : 0xba4d4ab0 nuse : 0xba4d4ab0 nsfeat : 0 [4:4] : 0 NPWG, NPWA, NPDG, NPDA, and NOWS are Not Supported [3:3] : 0 NGUID and EUI64 fields if non-zero, Reused [2:2] : 0 Deallocated or Unwritten Logical Block error Not Supported [1:1] : 0 Namespace uses AWUN, AWUPF, and ACWU [0:0] : 0 Thin Provisioning Not Supported <snip> <snip> nlbaf : 4 flbas : 0x2 [6:5] : 0 Most significant 2 bits of Current LBA Format Selected [4:4] : 0 Metadata Transferred in Separate Contiguous Buffer [3:0] : 0x2 Least significant 4 bits of Current LBA Format Selected <snip> <snip> LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use) LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded