Re: VM paused during multipath iSCSI reservation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday, 14 October 2021 09:28:12 CEST Michal Prívozník wrote:
> On 10/13/21 2:54 PM, Vojtech Juranek wrote:
> > Hi,
> > I'm trying to find the root cause for BZ #1898049 [1]. When setting up
> > Windows HA cluster on Windows Server VMs run on top of oVirt, Windows
> > cluster validator runs couple of tests and fails during test "Validate
> > SCSI-3 Persistent Reservation" and one of the VMs of> 
> > the cluster is paused with IO error. Disk definition is as follows:
> >     <disk type='block' device='lun' sgio='unfiltered' snapshot='no'>
> >     
> >       <driver name='qemu' type='raw' cache='none' error_policy='stop'
> >       io='native'/> <source
> >       dev='/dev/mapper/3600a09803830447a4f244c4657616f6f' index='1'>>       
> >         <seclabel model='dac' relabel='no'/>
> >         <reservations managed='yes'>
> >         
> >           <source type='unix'
> >           path='/var/lib/libvirt/qemu/domain-1-Windows-2016-2/pr-helper0.
> >           sock' mode='client'/>>         
> >         </reservations>
> >       
> >       </source>
> >       <backingStore/>
> >       <target dev='sdb' bus='scsi'/>
> >       <shareable/>
> >       <alias name='ua-26b4975e-e1d4-4e27-b2c6-2ea0894a571b'/>
> >       <address type='drive' controller='0' bus='0' target='0' unit='1'/>
> >     
> >     </disk>
> > 
> > and libvirt error I get is bellow [2].
> > 
> > When I try to create reservation from Windows VM manually, I get following
> > error (but not sure I do it whole process correctly):
> > 
> > .\sg_persist --out --register --param-sark=123abc e:
> >     QEMU   QEMU HARDDISK               2.5+
> >     Peripheral device type: disk
> > 
> > PR out (Register): command not supported
> > sg_persist failed: Illegal request, Invalid opcode
> > 
> > 
> > Do you have any ideas what could be wrong or how to determine
> > the root cause of this this issue?
> > 
> > Thanks in advance.
> > Vojta
> > 
> > 
> > [1] https://bugzilla.redhat.com/1898049
> > [2] libvirt debug log:
> > 
> > 2021-10-12 11:43:25.148+0000: 2006427: debug : qemuMonitorEmitIOError:1243
> > : mon=0x7fb02006a020 2021-10-12 11:43:25.148+0000: 2006427: info :
> > virObjectRef:402 : OBJECT_REF: obj=0x7fb02006a020 2021-10-12
> > 11:43:25.148+0000: 2006427: info : virObjectRef:402 : OBJECT_REF:
> > obj=0x7fafd0130020 2021-10-12 11:43:25.148+0000: 2000208: info :
> > virObjectRef:402 : OBJECT_REF: obj=0x7fafd010d340 2021-10-12
> > 11:43:25.148+0000: 2006427: info : virObjectNew:258 : OBJECT_NEW:
> > obj=0x7fb020082590 classname=virDomainEventIOError 2021-10-12
> > 11:43:25.148+0000: 2000208: info : vir_object_finalize:321 :
> > OBJECT_DISPOSE: obj=0x7fb020082500 2021-10-12 11:43:25.148+0000: 2006427:
> > info : virObjectNew:258 : OBJECT_NEW: obj=0x7fb020082620
> > classname=virDomainEventIOError 2021-10-12 11:43:25.148+0000: 2000208:
> > info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7fb020082500 2021-10-12
> > 11:43:25.148+0000: 2006427: debug : qemuProcessHandleIOError:907 :
> > Transitioned guest Windows-2016-2 to paused state due to IO error
> > 2021-10-12 11:43:25.148+0000: 2000208: info : virObjectUnref:380 :
> > OBJECT_UNREF: obj=0x7fafd010d340 2021-10-12 11:43:25.148+0000: 2006427:
> > info : virObjectNew:258 : OBJECT_NEW: obj=0x7fafbc1fb8c0
> > classname=virDomainEventLifecycle 2021-10-12 11:43:25.148+0000: 2006427:
> > debug : virDomainLockProcessPause:204 : plugin=0x7fafd01272a0
> > dom=0x7fb01400f5e0 state=0x7fb02401d768 2021-10-12 11:43:25.148+0000:
> > 2006427: debug : virDomainLockManagerNew:134 : plugin=0x7fafd01272a0
> > dom=0x7fb01400f5e0 withResources=1 2021-10-12 11:43:25.148+0000: 2006427:
> > debug : virLockManagerPluginGetDriver:276 : plugin=0x7fafd01272a0
> > 2021-10-12 11:43:25.148+0000: 2006427: debug : virLockManagerNew:300 :
> > driver=0x7fafd444a000 type=0 nparams=5 params=0x7fafd77de640 flags=0x0
> > 2021-10-12 11:43:25.148+0000: 2006427: debug : virLockManagerLogParams:97
> > :   key=uuid type=uuid value=70eee88c-ba2c-4c6c-bd51-c2b663db27f8
> > 2021-10-12 11:43:25.148+0000: 2006427: debug : virLockManagerLogParams:90
> > :   key=name type=string value=Windows-2016-2 2021-10-12
> > 11:43:25.148+0000: 2006427: debug : virLockManagerLogParams:78 :   key=id
> > type=uint value=1 2021-10-12 11:43:25.148+0000: 2006427: debug :
> > virLockManagerLogParams:78 :   key=pid type=uint value=2006418 2021-10-12
> > 11:43:25.148+0000: 2006427: debug : virLockManagerLogParams:93 :  
> > key=uri type=cstring value=(null) 2021-10-12 11:43:25.148+0000: 2006427:
> > debug : virDomainLockManagerNew:146 : Adding leases 2021-10-12
> > 11:43:25.148+0000: 2006427: debug : virDomainLockManagerNew:151 : Adding
> > disks 2021-10-12 11:43:25.149+0000: 2006427: debug :
> > virDomainLockManagerAddImage:90 : Add disk
> > /rhev/data-center/mnt/blockSD/7c4f09b6-9e87-436f-bda9-22d1f0b50955/images
> > /f5d6e074-dfe9-462d-8cfd-3e14b0eb5aea/766e36b2-84a6-43e7-a48b-a5f47e669860
> > 2021-10-12 11:43:25.149+0000: 2006427: debug :
> > virLockManagerAddResource:326 : lock=0x7fafbc19e250 type=0
> > name=/rhev/data-center/mnt/blockSD/7c4f09b6-9e87-436f-bda9-22d1f0b50955/i
> > mages/f5d6e074-dfe9-462d-8cfd-3e14b0eb5aea/766e36b2-84a6-43e7-a48b-a5f47e6
> > 69860 nparams=0 params=(nil) flags=0x0 2021-10-12 11:43:25.149+0000:
> > 2006427: debug : virDomainLockManagerAddImage:90 : Add disk
> > /dev/mapper/3600a09803830447a4f244c4657616f6f 2021-10-12
> > 11:43:25.149+0000: 2006427: debug : virLockManagerAddResource:326 :
> > lock=0x7fafbc19e250 type=0
> > name=/dev/mapper/3600a09803830447a4f244c4657616f6f nparams=0 params=(nil)
> > flags=0x2 2021-10-12 11:43:25.149+0000: 2006427: debug :
> > virLockManagerRelease:359 : lock=0x7fafbc19e250 state=0x7fb02401d768
> > flags=0x0 2021-10-12 11:43:25.149+0000: 2006427: debug :
> > virLockManagerFree:381 : lock=0x7fafbc19e250 2021-10-12
> > 11:43:25.149+0000: 2006427: debug : qemuProcessHandleIOError:920 :
> > Preserving lock state '<null>' 2021-10-12 11:43:25.150+0000: 2006427:
> > info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7fafd0130020 2021-10-12
> > 11:43:25.150+0000: 2006427: info : virObjectUnref:380 : OBJECT_UNREF:
> > obj=0x7fb02006a020 2021-10-12 11:43:25.150+0000: 2000208: info :
> > virObjectRef:402 : OBJECT_REF: obj=0x7fafd010d340 2021-10-12
> > 11:43:25.150+0000: 2000208: info : vir_object_finalize:321 :
> > OBJECT_DISPOSE: obj=0x7fb020082590 2021-10-12 11:43:25.150+0000: 2006427:
> > info : virObjectRef:402 : OBJECT_REF: obj=0x7fb02006a020 2021-10-12
> > 11:43:25.150+0000: 2000208: info : virObjectUnref:380 : OBJECT_UNREF:
> > obj=0x7fb020082590 2021-10-12 11:43:25.150+0000: 2006427: info :
> > virObjectUnref:380 : OBJECT_UNREF: obj=0x7fb02006a020 2021-10-12
> > 11:43:25.150+0000: 2000208: info : virObjectNew:258 : OBJECT_NEW:
> > obj=0x564a89a4cc60 classname=virDomain 2021-10-12 11:43:25.150+0000:
> > 2006427: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7fb02006a020
> > 2021-10-12 11:43:25.150+0000: 2000208: info : virObjectRef:402 :
> > OBJECT_REF: obj=0x7fafd0018ca0 2021-10-12 11:43:25.150+0000: 2000208:
> > info : virObjectRef:402 : OBJECT_REF: obj=0x564a89978df0 2021-10-12
> > 11:43:25.150+0000: 2000208: debug : virAccessManagerCheckDomain:238 :
> > manager=0x564a89978df0(name=stack) driver=QEMU domain=0x7ffd2c677010
> > perm=0 2021-10-12 11:43:25.150+0000: 2000208: debug :
> > virAccessManagerCheckDomain:238 : manager=0x564a89978e50(name=none)
> > driver=QEMU domain=0x7ffd2c677010 perm=0 2021-10-12 11:43:25.150+0000:
> > 2000208: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x564a89978df0
> > 2021-10-12 11:43:25.150+0000: 2000208: debug :
> > remoteRelayDomainEventIOErrorReason:529 : Relaying domain io error
> > Windows-2016-2 1 /dev/mapper/3600a09803830447a4f244c4657616f6f
> > ua-26b4975e-e1d4-4e27-b2c6-2ea0894a571b 1 , callback 3
> Unfortunately, you cut off the log a bit early/late; there should be a
> message from qemu's monitor mentioning BLOCK_IO_ERROR; Can you check
> your logs please?

yes, it's few lines earlier:

2021-10-12 11:43:22.297+0000: 2000213: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x564a89a1d010
2021-10-12 11:43:22.297+0000: 2000213: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x564a89a46360
2021-10-12 11:43:22.297+0000: 2000208: debug : virNetMessageFree:85 : msg=0x564a89a3d510 nfds=0 cb=(nil)
2021-10-12 11:43:22.297+0000: 2000208: debug : virNetServerClientCalculateHandleMode:154 : tls=(nil) hs=-1, rx=0x564a89a48620 tx=(nil)
2021-10-12 11:43:22.297+0000: 2000208: debug : virNetServerClientCalculateHandleMode:185 : mode=01
2021-10-12 11:43:25.148+0000: 2006427: info : virObjectRef:402 : OBJECT_REF: obj=0x7fb02006a020
2021-10-12 11:43:25.148+0000: 2006427: debug : qemuMonitorEmitEvent:1166 : mon=0x7fb02006a020 event=BLOCK_IO_ERROR
2021-10-12 11:43:25.148+0000: 2006427: info : virObjectRef:402 : OBJECT_REF: obj=0x7fb02006a020
2021-10-12 11:43:25.148+0000: 2006427: debug : qemuProcessHandleEvent:574 : vm=0x7fb01400f5e0
2021-10-12 11:43:25.148+0000: 2006427: info : virObjectNew:258 : OBJECT_NEW: obj=0x7fb020082500 classname=virDomainQemuMonitorEvent
2021-10-12 11:43:25.148+0000: 2006427: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7fb02006a020
2021-10-12 11:43:25.148+0000: 2006427: debug : qemuMonitorEmitIOError:1243 : mon=0x7fb02006a020

However, it seem the issue was caused by using error_policy='stop' in

<driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>

I need to double check, but switching to error_policy='report' seems to fix the issue
and HA validator succeeds (probably one of the test is testing concurrent reservations
and validating that one reservation fails?).

Thanks
Vojta

Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [Virt Tools]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux