Re: [PATCH] nvme: Fix io_opt limit setting

Keith Busch <kbusch@xxxxxxxxxx> · Thu, 14 May 2020 13:12:15 +0900

On Thu, May 14, 2020 at 03:47:56AM +0000, Damien Le Moal wrote:
> On 2020/05/14 12:40, Keith Busch wrote:
> > On Thu, May 14, 2020 at 10:54:52AM +0900, Damien Le Moal wrote:
> >> Currently, a namespace io_opt queue limit is set by default to the
> >> physical sector size of the namespace and to the the write optimal
> >> size (NOWS) when the namespace reports this value. This causes problems
> >> with block limits stacking in blk_stack_limits() when a namespace block
> >> device is combined with an HDD which generally do not report any optimal
> >> transfer size (io_opt limit is 0). The code:
> >>
> >> /* Optimal I/O a multiple of the physical block size? */
> >> if (t->io_opt & (t->physical_block_size - 1)) {
> >> 	t->io_opt = 0;
> >> 	t->misaligned = 1;
> >> 	ret = -1;
> >> }
> >>
> >> results in blk_stack_limits() to return an error when the combined
> >> devices have different but compatible physical sector sizes (e.g. 512B
> >> sector SSD with 4KB sector disks).
> >>
> >> Fix this by not setting the optiomal IO size limit if the namespace does
> >> not report an optimal write size value.
> > 
> > Won't this continue to break if a controller does report NOWS that's not
> > a multiple of the physical block size of the device it's stacking with?
> 
> When io_opt stacking is handled, the physical sector size for the stacked device
> is already resolved to a common value. If the NOWS value cannot accommodate this
> resolved physical sector size, this is an incompatible stacking, so failing is
> OK in that case.

I see, though it's not strictly incompatible as io_opt is merely a hint
that could continue to work if the stacked limit was recalculated as:

	if (t->io_opt & (t->physical_block_size - 1))
	 	t->io_opt = lcm(t->io_opt, t->physical_block_size);

Regardless, your patch does make sense, but it does have a merge
conflict with nvme-5.8.