Re: [PATCH 1/1] virtio-blk: Add description for blk_size field

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 25, 2024 at 8:05 AM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote:
>
> This field is only valid when the VIRTIO_BLK_F_BLK_SIZE feature bit is
> set.
>
> The blk_size field represents the smallest addressable unit of data that
> can be read from or written to the device. It is always a power of two
> and typically ranges from 512 bytes to larger values such as 4 KB.

I agree that this is what the blk_size field probably *should* have
meant (matching other storage protocols like ATA, SCSI, NVMe, ...),
but I believe rewriting the spec like this changes the meaning of the
blk_size field in an incompatible way.

Previously, blk_size was described as the "optimal" sector size, so it
seems clear that the driver is still allowed to send requests that are
not aligned to blk_size ("awareness of the correct value can affect
performance" but no mention of correctness). A device may need to do
extra work to support this, of course, such as turning an unaligned
write into a read-modify-write sequence if the underlying storage
device doesn't support such an access directly, but such requests are
still valid. It's also perfectly fine for a driver to ignore
device-offered features like VIRTIO_BLK_F_BLK_SIZE and still operate
correctly.

> Linux/Windows systems typically use 512-byte/4-KB block sizes.

While probably true, this doesn't really have anything to do with the
change (and the logical block size is a property of the storage medium
and controller, not anything to do with the operating system).

> This description provides clarity on the constraints of the blk_size
> field.
>
> Signed-off-by: Max Gurtovoy <mgurtovoy@xxxxxxxxxx>
> ---
>  device-types/blk/description.tex | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex
> index 2712ada..88a7591 100644
> --- a/device-types/blk/description.tex
> +++ b/device-types/blk/description.tex
> @@ -3,7 +3,10 @@ \section{Block Device}\label{sec:Device Types / Block Device}
>  The virtio block device is a simple virtual block device (ie.
>  disk). Read and write requests (and other exotic requests) are
>  placed in one of its queues, and serviced (probably out of order) by the
> -device except where noted.
> +device except where noted. Each block device consists of a sequence of logical
> +blocks. A logical block represents the smallest addressable unit of data that
> +can be read from or written to the device. The logical block size is always a
> +power of two. Logical block sizes may be 512 bytes, 1KB, 2KB, 4KB, 8KB, etc.

Does this imply that a device may return an error for a request where
the sector or num_sectors fields are not aligned to the blk_size? If
so, it should be called out as a more explicit normative requirement,
but that would definitely be an incompatible change.

The "smallest addressable unit" wording is also more confusing than
clarifying here, since in virtio-blk, the "smallest addressable unit"
is always 512 bytes, regardless of blk_size (even if a change like
this would enforce limits on the sector values that would actually be
allowed, it would still be possible to create a request that refers to
a smaller unit than blk_size, unlike other storage protocols where
logical block size changes the multiplier of the address, making it
impossible to address smaller units).

>  \subsection{Device ID}\label{sec:Device Types / Block Device / Device ID}
>    2
> @@ -135,6 +138,9 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device /
>  present. The availability of the others all depend on various feature
>  bits as indicated above.
>
> +The field \field{blk_size} exists only if VIRTIO_BLK_F_BLK_SIZE is set. This field
> +reports the logical block size of the device, expressed in bytes.

Not a new problem with this change (see another example right below),
but "set" is unclear - does that mean it is present if the device
offered the feature, or only if the driver also accepted it?

>  The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This field specifies
>  the number of queues.
>
> @@ -228,7 +234,7 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
>  \item The device size can be read from \field{capacity}.
>
>  \item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated,
> -  \field{blk_size} can be read to determine the optimal sector size
> +  \field{blk_size} can be read to determine the logical block size
>    for the driver to use. This does not affect the units used in
>    the protocol (always 512 bytes), but awareness of the correct
>    value can affect performance.
> @@ -282,6 +288,12 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
>
>  \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
>
> +Drivers MUST negotiate VIRTIO_BLK_F_BLK_SIZE if the feature is offered by the
> +device.

Why is this a MUST? This is also an incompatible change since there
was no such requirement before, and drivers are free to ignore
features they don't care about.

> +If the VIRTIO_BLK_F_BLK_SIZE feature is not offered by the device, then drivers
> +MAY assume that the logical block size is 512 bytes.
> +
>  Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of
>  sending VIRTIO_BLK_T_FLUSH commands.
>
[...]

In my opinion, this change should probably be something more like a
non-normative comment, along the lines of "blk_size is often the
logical block size of the storage medium", without changing the
existing meaning from previous spec versions. If there is a desire to
introduce a hard requirement for logical-block-aligned I/O requests,
that seems like it would have to be a new field and/or feature bit.

Thanks,
-- Daniel





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux