[no subject]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(This could probably use clarification that the "offset" is in bytes,
but it still seems pretty clear that it does not depend on blk_size.)

So a request with sector = 1 always means an offset of 512 bytes from
the beginning of the block device, even if the advertised blk_size is
4096.

This is different from (e.g.) a SCSI READ where the logical block
address is in units of the block length returned by READ CAPACITY, so
it is impossible to send a request that is not block size aligned.

[...]
>>> diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex
>>> index 2712ada..88a7591 100644
>>> --- a/device-types/blk/description.tex
>>> +++ b/device-types/blk/description.tex
>>> @@ -3,7 +3,10 @@ \section{Block Device}\label{sec:Device Types / Block Device}
>>>  The virtio block device is a simple virtual block device (ie.
>>>  disk). Read and write requests (and other exotic requests) are
>>>  placed in one of its queues, and serviced (probably out of order) by the
>>> -device except where noted.
>>> +device except where noted. Each block device consists of a sequence of logical
>>> +blocks. A logical block represents the smallest addressable unit of data that
>>> +can be read from or written to the device. The logical block size is always a
>>> +power of two. Logical block sizes may be 512 bytes, 1KB, 2KB, 4KB, 8KB, etc.
>>
>> Does this imply that a device may return an error for a request where
>> the sector or num_sectors fields are not aligned to the blk_size? If
>> so, it should be called out as a more explicit normative requirement,
>> but that would definitely be an incompatible change.
>
> As mentioned, the logical block size, such as 512 bytes, defines the smallest addressable unit for I/O operations on the storage device.
> This has important implications for I/O requests:
> 1. Minimum I/O size: You cannot send I/O requests smaller than the logical block size. For example, with a 512-byte logical block size, requests of 128 bytes are not valid.

128-byte requests are not valid, certainly, but that is already
covered by the wording in the spec ("in multiples of 512 bytes" in the
VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT description, and "The length of data
MUST be a multiple of 512 bytes for VIRTIO_BLK_T_IN and
VIRTIO_BLK_T_OUT requests" in the driver requirements).

> 2. I/O alignment: All I/O requests must be aligned to the logical block size. This means the target start address (on the media device) of the I/O must be a multiple of the logical block size (the expressed units in the spec are not relevant).

Unless there is a change to the spec (like this proposed patch), I
don't think this is required. I suppose a device that can't manage to
perform a misaligned I/O could choose to deny such requests and return
VIRTIO_BLK_S_IOERR, but this is not explicitly allowed or denied by
the current spec.

> 3. I/O size granularity: The size of I/O requests must be a multiple of the logical block size. For a 512-byte logical block size, valid I/O sizes would be 512 bytes, 1024 bytes, 1536 bytes, and so on.

Same as point 1 above, the only requirement (that I can see, anyway)
is that the I/O request has to be a multiple of 512 bytes.

>> The "smallest addressable unit" wording is also more confusing than
>> clarifying here, since in virtio-blk, the "smallest addressable unit"
>> is always 512 bytes, regardless of blk_size (even if a change like
>> this would enforce limits on the sector values that would actually be
>> allowed, it would still be possible to create a request that refers to
>> a smaller unit than blk_size, unlike other storage protocols where
>> logical block size changes the multiplier of the address, making it
>> impossible to address smaller units).
>
> Can you please point me to the place in spec that explicitly say that "in virtio-blk, the smallest addressable unit is always 512 bytes, regardless of blk_size" ?

The description of struct virtio_blk_req's sector field:
"The sector number indicates the offset (multiplied by 512) where the
read or write is to occur."

Unless we have different definitions of "smallest addressable unit",
that seems pretty clear. It is always possible to address a part of
the disk in 512-byte multiples (e.g. sectors 1 through 3) even when
the blk_size is larger than 512. Maybe those requests won't succeed,
but it is always possible to describe a region of the block device
with an address that is not a multiple of blk_size.

> I didn't find it in the spec and I don't agree with this call if it is implicit.

I agree it should be clarified. The current wording requires too much
reading between the lines.

> The implementation of the Linux kernel driver isn't implemented in this manner as well.

When filling the sector field, the Linux virtio-blk driver uses the
value of blk_rq_pos(), which is also always in units of 512 bytes, as
far as I can tell (it returns sector_t, which is in units of 512 bytes
per the definition in include/linux/blk_types.h).

[...]
>>> @@ -228,7 +234,7 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
>>>  \item The device size can be read from \field{capacity}.
>>>
>>>  \item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated,
>>> -  \field{blk_size} can be read to determine the optimal sector size
>>> +  \field{blk_size} can be read to determine the logical block size
>>>    for the driver to use. This does not affect the units used in
>>>    the protocol (always 512 bytes), but awareness of the correct
>>>    value can affect performance.
>>> @@ -282,6 +288,12 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
>>>
>>>  \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
>>>
>>> +Drivers MUST negotiate VIRTIO_BLK_F_BLK_SIZE if the feature is offered by the
>>> +device.
>>
>> Why is this a MUST? This is also an incompatible change since there
>> was no such requirement before, and drivers are free to ignore
>> features they don't care about.
>
> I can say SHOULD, but then the driver that is ignoring this feature may guess/assume the logical block size and it can be wrong.
>
> This may lead to an undefined behavior.

Unless the spec is changed, if a device can't correctly handle
non-blk_size-aligned requests, this seems like it would be a device
bug ("undefined behavior") or at least non-spec-derived limitation
(returning an error would not be undefined, but also probably wouldn't
result in a working system).

Certainly it would be preferable for the driver to send aligned
requests, but the existing wording seems to cover that pretty well
already.

Thanks,
-- Daniel





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux