Re: Problem with a disk device of type 'volume'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have progressed in my research.

I created a minimal test case in order to reproduce the problem (see below).

I made tests on 3 (physical) machines under Debian 11.4: the problem is present on 2 machines but there is no problem on the third.

I booted a machine where the problem is present into a Debian 11.4 live OS and made the test : it works, no problem.

So far, all my tests lead me to the following conclusions:
 - The problem is tied to the configuration of the system.
- It's not 'file permission' problem. The directory structure of the storage pool, the file permissions on this structure, the configuration of libvirt and qemu and the user under which the daemon runs are the same on all systems. - I have made the test with libvirt 7.0.0 & qemu 1.5.2 and with libvirt 8.0.0 and qemu 1.7.0 (from Debian 11 backports). The different versions have the same behavior. - Apparmor is not the culprit (No error in logs). I have also disabled it and the behavior is still in the same

I will appreciate any hint about what I should check to find the difference between the working systems and the failing ones.

Regards,
Fred

How to made a test (under root):

1/ Install libvirt & qemu if needed
apt install libvirt-daemon-system qemu-system-x86 virtinst

2/ Start libvirt daemon if needed
systemctl start libvirtd

3/ Create the default pool storage (if it is not created automatically)
virsh pool-define-as default dir - - - - /var/lib/libvirt/images/
virsh pool-build default
virsh pool-start default

5/ Download Debian 11.4 Generic cloud image and put it in the default storage pool wget -O /var/lib/libvirt/images/debian.qcow2 https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-genericcloud-amd64.qcow2

6/ Refresh the default storage and check the Debian image is visible.
virsh pool-refresh default
virsh vol-list --pool default

7) Start the default network
virsh net-start default

8) Create a VM based on the Debian 11.4 Generic cloud image
virt-install -n TESTBUG --disk vol=default/debian.qcow2 --memory 1024 --import --noreboot --graphics none

9/ Start the VM, it should start and work fine
virsh start TESTBUG

10/ Stop the VM
virsh shutdown TESTBUG

11/ Change the disk definition to switch to the disk type from 'file' to 'volume' and adapt the 'source' attributes accordingly.
virsh edit --domain TESTBUG

Change this section:
<disk type="file" device="disk">
  <driver name="qemu" type="qcow2"/>
  <source file="/var/lib/libvirt/images/debian.qcow2"/>
  <target dev="hda" bus="ide"/>
  <address type="drive" controller="0" bus="0" target="0" unit="0"/>
</disk>

to :
<disk type="volume" device="disk">
  <driver name="qemu" type="qcow2"/>
  <source pool="default" volume="debian.qcow2"/>
  <target dev="hda" bus="ide"/>
  <address type="drive" controller="0" bus="0" target="0" unit="0"/>
</disk>

12/ Start the VM again. It will either succeed or fail with the fololwing error : error creating libvirt domain: internal error: qemu unexpectedly closed the monitor: 2022-08-11T16:12:22.987252Z qemu-system-x86_64: -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/debian.qcow2","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/libvirt/images/debian.qcow2': Permission denied


Le 13/08/2022 à 12:39, Frédéric Lespez a écrit :
Hi,

I need some help to debug a problem with libvirt and a disk device of type 'volume'.

I have a VM failing to start with the following error :
$ virsh -c qemu:///system start server
error :Failed to start domain 'server'
error :internal error: process exited while connecting to monitor: 2022-08-13T09:26:50.121259Z qemu-system-x86_64: -blockdev {"driver":"file","filename":"/mnt/images/debian-11-genericcloud-amd64.qcow2","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}: Could not open '/mnt/images/debian-11-genericcloud-amd64.qcow2': Permission denied

I check the file access permission, but they are correct. I try to set everything to 777 or run qemu as root, but the problem persist.
$ ll -d /mnt /mnt/images /mnt/images/*
drwxr-xr-x 9 root         root         4,0K 31 déc.   2021 /mnt
drwxr-xr-x 2 root         root         4,0K 13 août  11:31 /mnt/images
-rw-r--r-- 1 libvirt-qemu libvirt-qemu 242M 13 août  11:31 /mnt/images/debian-11-genericcloud-amd64.qcow2 -rw-r--r-- 1 libvirt-qemu libvirt-qemu 366K 13 août  11:31 /mnt/images/server_cloudinit.iso -rw-r--r-- 1 libvirt-qemu libvirt-qemu 593M 13 août  11:59 /mnt/images/server_image.qcow2

After a lot of searching and testing, I found out that the disk device definition is linked to the source of the problem.
The disk device is defined like this :
<disk type="volume" device="disk">
   <driver name="qemu" type="qcow2"/>
   <source pool="TERRAFORM" volume="server_image.qcow2"/>
   <target dev="vda" bus="virtio"/>
  <address type="pci" domain="0x0000" bus="0x00" slot="0x05" function="0x0"/>
</disk>

This image 'server_image.qcow2' use a backing file:
$ qemu-img info /mnt/images/server_image.qcow2  --backing-chain
image: /mnt/stockage_rapide/VMs/terraform/puppetdev_server_image.qcow2
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 475 MiB
cluster_size: 65536
backing file: /mnt/images/debian-11-genericcloud-amd64.qcow2
backing file format: qcow2
Format specific information:
     compat: 0.10
     compression type: zlib
     refcount bits: 16

image: /mnt/images/debian-11-genericcloud-amd64.qcow2
file format: qcow2
virtual size: 2 GiB (2147483648 bytes)
disk size: 242 MiB
cluster_size: 65536
Format specific information:
     compat: 1.1
     compression type: zlib
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
     extended l2: false

And here is the definition of the associated storage pool :
<pool type="dir">
   <name>TERRAFORM</name>
   <uuid>dae00836-db4d-49ba-9d32-1f0278055516</uuid>
   <capacity unit="bytes">155674652672</capacity>
   <allocation unit="bytes">74396299264</allocation>
   <available unit="bytes">81278353408</available>
   <source>
   </source>
   <target>
     <path>/mnt/images</path>
     <permissions>
       <mode>0755</mode>
       <owner>0</owner>
       <group>0</group>
     </permissions>
   </target>
</pool>

If I changed the disk device definition to this (and changing only that), the domain start and works fine (no permission problem !).
<disk type="file" device="disk">
   <driver name="qemu" type="qcow2"/>
   <source file="/mnt/images/server_image.qcow2"/>
   <target dev="vda" bus="virtio"/>
  <address type="pci" domain="0x0000" bus="0x00" slot="0x05" function="0x0"/>
</disk>

Could you help me find the reason why the domain doesn't work when the disk device is of type 'volume' ?
Thanks in advance for your help.

Regards,
Fred

Additional information:
- Running this on Debian 11 with libvirt 8.0.0 (from backports) and qemu 7.0 (from backports). - Vanilla configuration of libvirt. I have just added my regular user to the libvirt group.
- Problem exists even if AppArmor is disabled.

PS: I want to use a disk device of type 'volume' because this domain is created by Terraform using the libvirt provider which use this kind of disk since it has some advantages. See the details here : https://github.com/dmacvicar/terraform-provider-libvirt/issues/126#issuecomment-480597050







[Index of Archives]     [Virt Tools]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux