Re: Guest vm doesn't recover after the nfs connection resume

Liang Cong <lcong@xxxxxxxxxx> · Tue, 14 Dec 2021 15:35:42 +0800

Hi Daniel,
Thanks for your reply. I tried the nfs hard mount, and got the same behavior of the soft mount.
But in the /var/log/message, got nfs server recovery message which is not printed when mounting as soft mode.

Dec 14 02:12:47 test-1 kernel: nfs: server ip not responding, still trying
Dec 14 02:13:39 test-1 kernel: nfs: server ip not responding, timed out
Dec 14 02:14:34 test-1 kernel: nfs: server ip OK
Dec 14 02:14:34 test-1 kernel: NFS: __nfs4_reclaim_open_state: Lock reclaim failed!

According to my understanding the vm boot process will not recover(the vm is still in running state, never paused) to normal until restarting the vm guest.
And it is not the issue of libvirt or qemu, it is just the correct behavior with the nfs connection timeout, right? 

Thanks,
Liang Cong

On Thu, Dec 9, 2021 at 6:03 PM Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote:
On Thu, Dec 09, 2021 at 05:54:15PM +0800, Liang Cong wrote:

> Dear developers：

> 

> I found one issue during regular test and I could not confirm whether it is

> a libvirt|qemu issue or it is a nfs client issue or it is not an issue, so

> could you help to check it?

> Below is the issue reproduce steps:

> 

> 1.there is a nfs server with exports file like:

> /nfs *(async,rw,no_root_squash)

> 2. host machine soft mount nfs:

> mount nfs_server_ip:/nfs /var/lib/libvirt/images/nfs -o v4,soft

> 3. start a guest vm with disk tag xml like below:

> <disk type='file' device='disk'>

> <driver name='qemu' type='qcow2'/>

> <source

> file='/var/lib/libvirt/images/nfs/RHEL-8.6.0-20211102.1-x86_64.qcow2'

> index='1'/>

> <backingStore/>

> <target dev='vda' bus='virtio'/>

> <alias name='virtio-disk0'/>

> </disk>

> 4.Start the vm and during the guest vm boot, apply the iptables rule to

> drop the nfs connection to nfs server

> iptables -A OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP

> 5. Wait until the error log appear in /var/log/message

> kernel: nfs: server nfs_server_ip not responding, timed out

> 6. delete the iptables rule to retain the connection to nfs server

> iptables -D OUTPUT -d nfs_server_ip -p tcp --dport 2049 -j DROP

> 7. check the guest vm, found the boot process with error and can not

> recover.

> rror: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector

> 

> 0x7ab8 from `hd0'.

> 

> error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector

> 

> 0x9190 from `hd0'.

> 

> error: ../../grub-core/disk/i386/pc/biosdisk.c:546:failure reading sector

So this shows that I/O errors have been sent from the host to the guest.

This means two things:

 - The host has reported I/O errors to QEMU

 - QEMU is confjigured to reporte I/O errors to the guest

   (rerror/werror attributes for disk config)

I expect the first point there is a result of you using 'soft' for

the NFS mount - try it again with 'hard'.

The alternative for 'rerror/werror' is to pause the guest, allowing

the host problem to be solved whereupon you unpause the guest.

Overall this behaviour just looks like a result of your config

choices.

Regards,

Daniel

-- 

|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|

|: https://libvirt.org         -o-            https://fstop138.berrange.com :|

|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|