Re: Guest vm doesn't recover after the nfs connection resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Thu, Dec 09, 2021 at 05:54:15PM +0800, Liang Cong wrote:
> Dear developers:
> I found one issue during regular test and I could not confirm whether it is
> a libvirt|qemu issue or it is a nfs client issue or it is not an issue, so
> could you help to check it?
> Below is the issue reproduce steps:
> 1.there is a nfs server with exports file like:
> /nfs *(async,rw,no_root_squash)
> 2. host machine soft mount nfs:
> mount nfs_server_ip:/nfs /var/lib/libvirt/images/nfs -o v4,soft
> 3. start a guest vm with disk tag xml like below:
> <disk type='file' device='disk'>
> <driver name='qemu' type='qcow2'/>
> <source
> file='/var/lib/libvirt/images/nfs/RHEL-8.6.0-20211102.1-x86_64.qcow2'
> index='1'/>
> <backingStore/>
> <target dev='vda' bus='virtio'/>
> <alias name='virtio-disk0'/>
> </disk>
> 4.Start the vm and during the guest vm boot, apply the iptables rule to
> drop the nfs connection to nfs server
> iptables -A OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP
> 5. Wait until the error log appear in /var/log/message
> kernel: nfs: server nfs_server_ip not responding, timed out
> 6. delete the iptables rule to retain the connection to nfs server
> iptables -D OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP
> 7. check the guest vm, found the boot process with error and can not
> recover.
> rror: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector
> 0x7ab8 from `hd0'.
> error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector
> 0x9190 from `hd0'.
> error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector

So this shows that I/O errors have been sent from the host to the guest.

This means two things:

 - The host has reported I/O errors to QEMU
 - QEMU is confjigured to reporte I/O errors to the guest
   (rerror/werror attributes for disk config)

I expect the first point there is a result of you using 'soft' for
the NFS mount - try it again with 'hard'.

The alternative for 'rerror/werror' is to pause the guest, allowing
the host problem to be solved whereupon you unpause the guest.

Overall this behaviour just looks like a result of your config

|:      -o- :|
|:         -o-   :|
|:    -o- :|

[Index of Archives]     [Virt Tools]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux