On Wed, Sep 25, 2024 at 8:05 AM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote: > > This field is only valid when the VIRTIO_BLK_F_BLK_SIZE feature bit is > set. > > The blk_size field represents the smallest addressable unit of data that > can be read from or written to the device. It is always a power of two > and typically ranges from 512 bytes to larger values such as 4 KB. I agree that this is what the blk_size field probably *should* have meant (matching other storage protocols like ATA, SCSI, NVMe, ...), but I believe rewriting the spec like this changes the meaning of the blk_size field in an incompatible way. Previously, blk_size was described as the "optimal" sector size, so it seems clear that the driver is still allowed to send requests that are not aligned to blk_size ("awareness of the correct value can affect performance" but no mention of correctness). A device may need to do extra work to support this, of course, such as turning an unaligned write into a read-modify-write sequence if the underlying storage device doesn't support such an access directly, but such requests are still valid. It's also perfectly fine for a driver to ignore device-offered features like VIRTIO_BLK_F_BLK_SIZE and still operate correctly. > Linux/Windows systems typically use 512-byte/4-KB block sizes. While probably true, this doesn't really have anything to do with the change (and the logical block size is a property of the storage medium and controller, not anything to do with the operating system). > This description provides clarity on the constraints of the blk_size > field. > > Signed-off-by: Max Gurtovoy <mgurtovoy@xxxxxxxxxx> > --- > device-types/blk/description.tex | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex > index 2712ada..88a7591 100644 > --- a/device-types/blk/description.tex > +++ b/device-types/blk/description.tex > @@ -3,7 +3,10 @@ \section{Block Device}\label{sec:Device Types / Block Device} > The virtio block device is a simple virtual block device (ie. > disk). Read and write requests (and other exotic requests) are > placed in one of its queues, and serviced (probably out of order) by the > -device except where noted. > +device except where noted. Each block device consists of a sequence of logical > +blocks. A logical block represents the smallest addressable unit of data that > +can be read from or written to the device. The logical block size is always a > +power of two. Logical block sizes may be 512 bytes, 1KB, 2KB, 4KB, 8KB, etc. Does this imply that a device may return an error for a request where the sector or num_sectors fields are not aligned to the blk_size? If so, it should be called out as a more explicit normative requirement, but that would definitely be an incompatible change. The "smallest addressable unit" wording is also more confusing than clarifying here, since in virtio-blk, the "smallest addressable unit" is always 512 bytes, regardless of blk_size (even if a change like this would enforce limits on the sector values that would actually be allowed, it would still be possible to create a request that refers to a smaller unit than blk_size, unlike other storage protocols where logical block size changes the multiplier of the address, making it impossible to address smaller units). > \subsection{Device ID}\label{sec:Device Types / Block Device / Device ID} > 2 > @@ -135,6 +138,9 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device / > present. The availability of the others all depend on various feature > bits as indicated above. > > +The field \field{blk_size} exists only if VIRTIO_BLK_F_BLK_SIZE is set. This field > +reports the logical block size of the device, expressed in bytes. Not a new problem with this change (see another example right below), but "set" is unclear - does that mean it is present if the device offered the feature, or only if the driver also accepted it? > The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This field specifies > the number of queues. > > @@ -228,7 +234,7 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic > \item The device size can be read from \field{capacity}. > > \item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, > - \field{blk_size} can be read to determine the optimal sector size > + \field{blk_size} can be read to determine the logical block size > for the driver to use. This does not affect the units used in > the protocol (always 512 bytes), but awareness of the correct > value can affect performance. > @@ -282,6 +288,12 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic > > \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} > > +Drivers MUST negotiate VIRTIO_BLK_F_BLK_SIZE if the feature is offered by the > +device. Why is this a MUST? This is also an incompatible change since there was no such requirement before, and drivers are free to ignore features they don't care about. > +If the VIRTIO_BLK_F_BLK_SIZE feature is not offered by the device, then drivers > +MAY assume that the logical block size is 512 bytes. > + > Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of > sending VIRTIO_BLK_T_FLUSH commands. > [...] In my opinion, this change should probably be something more like a non-normative comment, along the lines of "blk_size is often the logical block size of the storage medium", without changing the existing meaning from previous spec versions. If there is a desire to introduce a hard requirement for logical-block-aligned I/O requests, that seems like it would have to be a new field and/or feature bit. Thanks, -- Daniel