Re: [PATCH] qemu: Adjust size for qcow2/qed if not on sector boundary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 05/07/2014 10:34 AM, Ján Tomko wrote:
> On 05/07/2014 01:58 PM, John Ferlan wrote:
>>
>>
>> On 04/08/2014 12:26 PM, John Ferlan wrote:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1002813
>>>
>>> If qemuDomainBlockResize() is passed a size not on a KiB boundary - that
>>> is passed a size based in bytes (VIR_DOMAIN_BLOCK_RESIZE_BYTES), then
>>> depending on the source format (qcow2 or qed), the value passed must
>>> be on a sector (or 512 byte) boundary. Since other libvirt code quietly
>>> adjusts the capacity values, then do so here as well - of course ensuring
>>> that adjustment still fits.
>>>
>>> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx>
>>> ---
>>>  src/qemu/qemu_driver.c | 22 ++++++++++++++++++++++
>>>  1 file changed, 22 insertions(+)
>>>
>>
>> Although discussion was taken off this list - the changes here were
>> ACK'd and pushed today...
> 
> I think ACKs should be on-list.
> 

Understood - Eric's CC/query to qemu-devel on his initial response got
no response other than my followup... So a more direct question was
asked to (I assume) someone more involved in the blockdev layer. We got
a direct response from that, but nothing else in general from qemu. So,
after a couple more weeks of receiving no feedback - I queried whether
the change was OK. Eric's response indicated to go ahead with this patch
- all these are attached to this response if interested. While the 3
letters weren't used, I took the response as implicitly being OK with
the change.

>>
>> Essentially the following API's will round up the value as well:
>>
>>     virStorageBackendCreateQcowCreate()
>>     virStorageBackendLogicalCreateVol()
>>     virStorageBackendCreateQemuImgCmd()
>>
>> For libvirt created volumes, virStorageBackendCreateQemuImgCmd() or
>> virStorageBackendCreateQcowCreate() is called - both will take the
>> capacity value and VIR_DIV_UP using 1024. For the vol-resize path (e.g.
>> non running vm case), virStorageBackendFilesystemResizeQemuImg() will
>> use ROUND_UP on 512 byte value because it knows (and comments) that
>> qemu-img will fail to resize on non sector boundaries.
>>
>> Additionally, it was noted that using "K" and "KiB" would produce 1024
>> based results, it's libvirt's allowance of "KB" for sizes that results
>> in the nuance. Being strict KB shouldn't be used for storage, but rather
>> than penalize for not knowing the difference between KiB and KB the code
>> just assumes KiB should have been used.
>>
>> John
>>
>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>> index 4bb4819..3e407d7 100644
>>> --- a/src/qemu/qemu_driver.c
>>> +++ b/src/qemu/qemu_driver.c
>>> @@ -9421,6 +9421,7 @@ qemuDomainBlockResize(virDomainPtr dom,
>>>      virDomainObjPtr vm;
>>>      qemuDomainObjPrivatePtr priv;
>>>      int ret = -1, idx;
>>> +    unsigned long long size_up;
>>>      char *device = NULL;
>>>      virDomainDiskDefPtr disk = NULL;
>>>  
>>> @@ -9467,6 +9474,21 @@ qemuDomainBlockResize(virDomainPtr dom,
>>>      }
>>>      disk = vm->def->disks[idx];
>>>  
>>> +    /* qcow2 and qed must be sized appropriately, so be sure our value
>>> +     * is sized appropriately and will fit
>>> +     */
>>> +    if (size != size_up &&
>>> +        (disk->src.format == VIR_STORAGE_FILE_QCOW2 ||
>>> +         disk->src.format == VIR_STORAGE_FILE_QED)) {
>>> +        if (size_up > ULLONG_MAX) {
> 
> This is always false.
> 

An OVERLy cautious check - I cannot remember what I was thinking about a
month ago... I think this was the check can VIR_ROUND_UP provide an
incorrect value. I can send a follow-up patch to remove those lines if
that's desired.

>>> +            virReportError(VIR_ERR_OVERFLOW,
>>> +                           _("size must be less than %llu KiB"),
>>> +                           ULLONG_MAX / 1024);
>>> +            goto endjob;
>>> +        }
>>> +        size = size_up;
> 
> Just a nitpick: rounding it up unconditionally here would get rid of the
> temporary variable and have no effect on values specified without the BYTES flag.
> 

Only qcow2 and qed have this issue regarding needing to be on a 512 byte
boundary. Since this is a generic routine I was limiting the rounding to
the two types from the bz rather than taking a chance that some generic
round up would cause some other issue.  Or am I misinterpreting your
comment?

John
--- Begin Message ---
Am 11.04.2014 um 16:09 hat Eric Blake geschrieben:
> [adding Kevin for a qemu perspective]
> 
> On 04/11/2014 07:23 AM, John Ferlan wrote:
> > On 04/08/2014 12:45 PM, Eric Blake wrote:
> >> On 04/08/2014 10:26 AM, John Ferlan wrote:
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1002813
> >>>
> 
> >>
> >> $ qemu-img create -f qcow2 img 12345
> >> Formatting 'img', fmt=qcow2 size=12345 encryption=off cluster_size=65536
> >> lazy_refcounts=off
> >> $ qemu-img info img
> >> image: img
> >> file format: qcow2
> >> virtual size: 12K (12288 bytes)
> >> disk size: 196K
> >> cluster_size: 65536
> >> Format specific information:
> >>     compat: 1.1
> >>     lazy refcounts: false
> >>
> >> Wait a second - qemu-img rounded DOWN.  That's wrong - it allocated less
> >> bytes than I requested.  I think we need to first figure out what's
> >> going on with the qemu side, on whether qemu should be supporting
> >> unaligned requestes, before trying to paper around it in libvirt.
> >>
> > 
> > Before you disappear with baby #4... Since I've seen no activity on the
> > qemu-devel list with regard to this question/issue, is there a chance
> > this moves forward or is some sort of redesign desired?
> 
> I'm still hoping we can get some feedback from the qemu team; Kevin,
> feel free to pass this on to other block maintainers if you are not the
> right person.

Generally, the qemu block layer works on arbitrary units of 512 bytes,
which it calls sectors, but have really nothing to do with the sectors
on the host disk (it could have a different sector size). The
consequence is that images that are not aligned to a sector boundary
will cause trouble.

I believe the correct behaviour for qemu would be that it errors out
when it can't deal with the exact number that was given. I don't think
silently rounding up or rounding down is appropriate.

> > I based my decision on what existing API's do, see:
> > 
> >     virStorageBackendCreateQcowCreate()
> >     virStorageBackendLogicalCreateVol()
> >     virStorageBackendCreateQemuImgCmd()
> > 
> > each will round up. I searched on VIR_DIV_UP() or VIR_ROUND_UP() - there
> > is no truncate or similar _DOWN macros...
> 
> That's because C already does truncating division without any additional
> effort.
> 
> > 
> > I did take a peek at the qemu code - I see that qcow2_truncate() will
> > enforce 512 byte blocks as does the qed_is_image_size_valid() code. The
> > 512 is a fairly standard value from my history (been so since my OpenVMS
> > days). It's really too bad there isn't an API that would tell us what
> > their block size is as I've also seen usage of multiples of 512.
> 
> I'm still trying to determine whether qemu's choice to require an exact
> cluster multiple for qcow2 sizes is a fundamental design limitation of
> the qcow2 format, or merely an implementation shortcut because no one
> was willing to support the last sector being a partial sector.  If the
> last sector does not have to be full, then it feels like there are qemu
> bugs that need to be addressed.

An exact multiple of the cluster size is not required as far as I can
tell, but for now we should enforce that it's a multiple of 512.

> > Considering both qed and qcow2 will fail without the right size - I just
> > don't see the value in making a call we know will fail only to fall back
> > with the rounded up (or down) size.
> 
> They may fail with unaligned sizes now, but if the file format
> fundamentally allows unaligned sizes and it is merely a missing
> implementation, then libvirt should be robust to the future day when
> unaligned sizes are implemented.

In theory that could happen, yes. There are probably no guest devices
that allow bytewise access, but with things like block jobs and the
built-in NBD servers there may be use cases where unaligned sizes could
have a valid use case.

Kevin

Attachment: pgp_bl5BLVUwO.pgp
Description: PGP signature


--- End Message ---
--- Begin Message ---

On 04/30/2014 09:37 AM, John Ferlan wrote:
> 
> 
> On 04/11/2014 10:09 AM, Eric Blake wrote:
>> [adding Kevin for a qemu perspective]
>>
>> On 04/11/2014 07:23 AM, John Ferlan wrote:
>>> On 04/08/2014 12:45 PM, Eric Blake wrote:
>>>> On 04/08/2014 10:26 AM, John Ferlan wrote:
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1002813
>>>>>
>>
>>>>
>>>> $ qemu-img create -f qcow2 img 12345
>>>> Formatting 'img', fmt=qcow2 size=12345 encryption=off cluster_size=65536
>>>> lazy_refcounts=off
>>>> $ qemu-img info img
>>>> image: img
>>>> file format: qcow2
>>>> virtual size: 12K (12288 bytes)
>>>> disk size: 196K
>>>> cluster_size: 65536
>>>> Format specific information:
>>>>     compat: 1.1
>>>>     lazy refcounts: false
>>>>
>>>> Wait a second - qemu-img rounded DOWN.  That's wrong - it allocated less
>>>> bytes than I requested.  I think we need to first figure out what's
>>>> going on with the qemu side, on whether qemu should be supporting
>>>> unaligned requestes, before trying to paper around it in libvirt.
>>>>
>>>
>>> Before you disappear with baby #4... Since I've seen no activity on the
>>> qemu-devel list with regard to this question/issue, is there a chance
>>> this moves forward or is some sort of redesign desired?
>>
>> I'm still hoping we can get some feedback from the qemu team; Kevin,
>> feel free to pass this on to other block maintainers if you are not the
>> right person.
>>

ping?

I know you're out again starting tomorrow and I'd like to figure out
where things stand with this.

tks,

John
> 
> Given Kevin's feedback and the lack of comments on the qemu-devel list -
> where does this stand?  It is on the RHEL 6.6 bug list - so one way or
> another it seems there should be some disposition made.  While I don't
> disagree that qemu-img create does truncation, our code already does
> rounding up for any non aligned value passed.
> 
> For the create path - eventually virStorageBackendCreateQemuImgCmd() or
> virStorageBackendCreateQcowCreate() is called - both will take the
> capacity value and VIR_DIV_UP using 1024. For the vol-resize path,
> eventually virStorageBackendFilesystemResizeQemuImg() will use ROUND_UP
> on 512 byte value because it knows (and comments) that qemu-img will
> fail to resize on non sector boundaries.
> 
> If I follow the resize logic using a straight qemu-img example:
> 
> $ qemu-img create -f qcow2 /home/vm-images/disk-qcow2-5k.img 5000
> Formatting '/home/vm-images/disk-qcow2-5k.img', fmt=qcow2 size=5000
> encryption=off cluster_size=65536 lazy_refcounts=off
> $ qemu-img info /home/vm-images/disk-qcow2-5k.img
> image: /home/vm-images/disk-qcow2-5k.img
> file format: qcow2
> virtual size: 4.5K (4608 bytes)
> disk size: 196K
> cluster_size: 65536
> $ qemu-img resize /home/vm-images/disk-qcow2-5k.img 6000
> qemu-img: The new size must be a multiple of 512
> qemu-img: Error resizing image (22)
> $ qemu-img resize /home/vm-images/disk-qcow2-5k.img 6K
> Image resized.
> $ qemu-img info /home/vm-images/disk-qcow2-5k.img
> image: /home/vm-images/disk-qcow2-5k.img
> file format: qcow2
> virtual size: 6.0K (6144 bytes)
> disk size: 196K
> cluster_size: 65536
> $
> 
> NOTE there is a difference of course between 5000 and 5k:
> $ qemu-img create -f qcow2 /home/vm-images/disk-qcow2-5k.img 5k
> $ qemu-img info /home/vm-images/disk-qcow2-5k.img
> image: /home/vm-images/disk-qcow2-5k.img
> file format: qcow2
> virtual size: 5.0K (5120 bytes)
> disk size: 196K
> cluster_size: 65536
> $
> 
> For qed it's:
> 
> $ qemu-img create -f qed /home/vm-images/disk-qed-5k.img 5000
> Formatting '/home/vm-images/disk-qed-5k.img', fmt=qed size=5000
> cluster_size=65536 table_size=0
> QED image size must be a non-zero multiple of cluster size and less than
> 70368744177664 bytes
> qemu-img: /home/vm-images/disk-qed-5k.img: error while creating qed:
> Invalid argument
> $ qemu-img create -f qed /home/vm-images/disk-qed-5k.img 5K
> Formatting '/home/vm-images/disk-qed-5k.img', fmt=qed size=5120
> cluster_size=65536 table_size=0
> $ qemu-img info /home/vm-images/disk-qed-5k.img
> image: /home/vm-images/disk-qed-5k.img
> file format: qed
> virtual size: 5.0K (5120 bytes)
> disk size: 260K
> cluster_size: 65536
> $ qemu-img resize /home/vm-images/disk-qed-5k.img 6000
> qemu-img: Error resizing image (22)
> $ qemu-img resize /home/vm-images/disk-qed-5k.img 6K
> Image resized.
> $ qemu-img info /home/vm-images/disk-qed-5k.img
> image: /home/vm-images/disk-qed-5k.img
> file format: qed
> virtual size: 6.0K (6144 bytes)
> disk size: 260K
> cluster_size: 65536
> $
> 
> In the end, qemu-img 'assumes' "K" is KiB essentially, while libvirt
> makes the distinction of KB and/or KiB. Although a just providing k
> would assume KiB. We're only "penalizing" the user that doesn't
> understand that KB is different than KiB.
> 
> Also for comparison (assuming TestPool is created/active):
> 
> $ virsh vol-create-as TestPool disk-qcow-5k.img 5k --format qcow2
> $ virsh vol-create-as TestPool disk-qcow-5kb.img 5kb --format qcow2
> $ virsh vol-create-as TestPool disk-qcow-5kib.img 5kib --format qcow2
> $ qemu-img info TestPool/disk-qcow-5k.img
> image: TestPool/disk-qcow-5k.img
> file format: qcow2
> virtual size: 5.0K (5120 bytes)
> disk size: 196K
> cluster_size: 65536
> $ qemu-img info TestPool/disk-qcow-5kb.img
> image: TestPool/disk-qcow-5kb.img
> file format: qcow2
> virtual size: 5.0K (5120 bytes)
> disk size: 196K
> cluster_size: 65536
> $ qemu-img info TestPool/disk-qcow-5kib.img
> image: TestPool/disk-qcow-5kib.img
> file format: qcow2
> virtual size: 5.0K (5120 bytes)
> disk size: 196K
> cluster_size: 65536
> $
> 
> 
> John
> 
>>>
>>> I based my decision on what existing API's do, see:
>>>
>>>     virStorageBackendCreateQcowCreate()
>>>     virStorageBackendLogicalCreateVol()
>>>     virStorageBackendCreateQemuImgCmd()
>>>
>>> each will round up. I searched on VIR_DIV_UP() or VIR_ROUND_UP() - there
>>> is no truncate or similar _DOWN macros...
>>
>> That's because C already does truncating division without any additional
>> effort.
>>
>>>
>>> I did take a peek at the qemu code - I see that qcow2_truncate() will
>>> enforce 512 byte blocks as does the qed_is_image_size_valid() code. The
>>> 512 is a fairly standard value from my history (been so since my OpenVMS
>>> days). It's really too bad there isn't an API that would tell us what
>>> their block size is as I've also seen usage of multiples of 512.
>>
>> I'm still trying to determine whether qemu's choice to require an exact
>> cluster multiple for qcow2 sizes is a fundamental design limitation of
>> the qcow2 format, or merely an implementation shortcut because no one
>> was willing to support the last sector being a partial sector.  If the
>> last sector does not have to be full, then it feels like there are qemu
>> bugs that need to be addressed.
>>
>>>
>>> Considering both qed and qcow2 will fail without the right size - I just
>>> don't see the value in making a call we know will fail only to fall back
>>> with the rounded up (or down) size.
>>
>> They may fail with unaligned sizes now, but if the file format
>> fundamentally allows unaligned sizes and it is merely a missing
>> implementation, then libvirt should be robust to the future day when
>> unaligned sizes are implemented.
>>
>>

--- End Message ---
--- Begin Message ---
On 04/30/2014 07:37 AM, John Ferlan wrote:
> 
> 
> On 04/11/2014 10:09 AM, Eric Blake wrote:
>> [adding Kevin for a qemu perspective]
>>
> Given Kevin's feedback and the lack of comments on the qemu-devel list -
> where does this stand?  It is on the RHEL 6.6 bug list - so one way or
> another it seems there should be some disposition made.  While I don't
> disagree that qemu-img create does truncation, our code already does
> rounding up for any non aligned value passed.

Regardless of what qemu ends up doing (even if they keep the current
behavior of rounding down, although I argue that is a bug in qemu),
libvirt should be guaranteeing the user at least as much storage as they
asked for, by rounding up if need be.

> 
> For the create path - eventually virStorageBackendCreateQemuImgCmd() or
> virStorageBackendCreateQcowCreate() is called - both will take the
> capacity value and VIR_DIV_UP using 1024. For the vol-resize path,
> eventually virStorageBackendFilesystemResizeQemuImg() will use ROUND_UP
> on 512 byte value because it knows (and comments) that qemu-img will
> fail to resize on non sector boundaries.

If qemu-img learns how to resize to a non-aligned size, hopefully it
will also include a witness we can probe to know about that new feature.
 In the meantime, rounding up is appropriate.


> 
> NOTE there is a difference of course between 5000 and 5k:
> $ qemu-img create -f qcow2 /home/vm-images/disk-qcow2-5k.img 5k
> $ qemu-img info /home/vm-images/disk-qcow2-5k.img
> image: /home/vm-images/disk-qcow2-5k.img
> file format: qcow2
> virtual size: 5.0K (5120 bytes)
> disk size: 196K
> cluster_size: 65536
> $

> 
> In the end, qemu-img 'assumes' "K" is KiB essentially, while libvirt
> makes the distinction of KB and/or KiB. Although a just providing k
> would assume KiB. We're only "penalizing" the user that doesn't
> understand that KB is different than KiB.

Both libvirt and qemu treat 'k' as 'KiB' (multiples of 1024); it's just
that libvirt additionally accepts "kB" (multiples of 1000) as well as
the proper spelling "KiB" (qemu only accepts the one-letter spelling).


>>>
>>> Considering both qed and qcow2 will fail without the right size - I just
>>> don't see the value in making a call we know will fail only to fall back
>>> with the rounded up (or down) size.
>>
>> They may fail with unaligned sizes now, but if the file format
>> fundamentally allows unaligned sizes and it is merely a missing
>> implementation, then libvirt should be robust to the future day when
>> unaligned sizes are implemented.

At this point, we can assume that qemu either won't change (to pardon my
pun, the silence from qemu side speaks volumes, unfortunately); but if
it does, we'll try hard to ensure they give us something to observe the
new feature and teach libvirt to cope at that time.  For now, just go
ahead with the patch to libvirt that rounds up to aligned sizes.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]