Re: 4.14.18 -> 4.14.24 - almost all guests hanged

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nikola,

Thanks for reporting the problem. Some questions inline.

2018-03-05 9:36 GMT+01:00 Nikola Ciprich <nikola.ciprich@xxxxxxxxxxx>:
> Hi,
>
> I'd like to report that when upgrading our cluster from 4.14.18 to
>  4.14.24-rc1 (with live guests migration), almost none of guests survived..
What's your hardware setup, intel with IBPB enabled microcode?
Does guests hang right after live migration?

Are you able to reproduce the problem, does it work with latest upstream?

Not sure it helps, but following patch is missing in 4.14.24

commit 37b95951c58fdf08dc10afa9d02066ed9f176fb5 upstream.

kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit
status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is
to fix it.

Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l is set)
Reported-by: Jeremi Piotrowski <jeremi.piotrowski@xxxxxxxxx>
Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx>
Signed-off-by: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx>
Signed-off-by: Radim Krčmář <rkrcmar@xxxxxxxxxx>

Regards,
Jack
>
> I noticed that most of them got stuck in "paused" state without
> possibility to resume (virsh just reported guest cannot be continued and
> needs to be rebooted).
>
> in dmesg, lots of following messages appeared:
>
> [  116.593508] device vnet0 entered promiscuous mode
> [  124.143532] *** Guest State ***
> [  124.143594] CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
> [  124.143668] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871
> [  124.143871] CR3 = 0x00000000feffc000
> [  124.143984] RSP = 0xffffffff82003e98  RIP = 0xffffffff816df002
> [  124.144102] RFLAGS=0x00000246         DR7 = 0x0000000000000400
> [  124.144221] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
> [  124.144341] CS:   sel=0xf000, attr=0x0009b, limit=0x0000ffff, base=0x00000000ffff0000
> [  124.144516] DS:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> [  124.144692] SS:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> [  124.144907] ES:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> [  124.145089] FS:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> [  124.145272] GS:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> [  124.145447] GDTR:                           limit=0x0000ffff, base=0x0000000000000000
> [  124.145626] LDTR: sel=0x0000, attr=0x00082, limit=0x0000ffff, base=0x0000000000000000
> [  124.145814] IDTR:                           limit=0x0000ffff, base=0x0000000000000000
> [  124.145995] TR:   sel=0x0000, attr=0x0008b, limit=0x0000ffff, base=0x0000000000000000
> [  124.146173] EFER =     0x0000000000000000  PAT = 0x0007040600070406
> [  124.146292] DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
> [  124.146466] Interruptibility = 00000000  ActivityState = 00000000
> [  124.146579] *** Host State ***
> [  124.146687] RIP = 0xffffffffa046a817  RSP = 0xffffc900200a7cb8
> [  124.146832] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
> [  124.146961] FSBase=00007fe82eff7700 GSBase=ffff881fffb40000 TRBase=fffffe00000df000
> [  124.147144] GDTBase=fffffe00000dd000 IDTBase=fffffe0000000000
> [  124.147262] CR0=0000000080050033 CR3=0000001f5b8fe004 CR4=00000000000626e0
> [  124.147381] Sysenter RSP=fffffe00000de200 CS:RIP=0010:ffffffff81801f60
> [  124.147499] EFER = 0x0000000000000d01  PAT = 0x0407050600070106
> [  124.147614] *** Control State ***
> [  124.147734] PinBased=0000007f CPUBased=96a1e9fa SecondaryExec=000004f2
> [  124.147849] EntryControls=0000d1ff ExitControls=002fefff
> [  124.147965] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
> [  124.148085] VMEntry: intr_info=80000081 errcode=00000000 ilen=00000000
> [  124.148201] VMExit: intr_info=00000000 errcode=00000000 ilen=00000000
> [  124.148318]         reason=80000021 qualification=0000000000000000
> [  124.148432] IDTVectoring: info=00000000 errcode=00000000
> [  124.148545] TSC Offset = 0xffed7296fb06bc34
> [  124.148655] TPR Threshold = 0x00
> [  124.148770] EPT pointer = 0x0000001f1a0af01e
> [  124.148882] PLE Gap=00000080 Window=00001000
> [  124.148995] Virtual processor ID = 0x0001
>
> (never seen anything like that)
>
> I haven't yet went through all patches between those two versions, so don't
> have any suspicion yet.. If anyone recognizes this as known problem, please
> let me know..
>
> I'm going to try whether I'm able to reproduce the problem.
>
> BR
>
> nik




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux