2018-03-07 21:29 GMT+01:00 Nikola Ciprich <nikola.ciprich@xxxxxxxxxxx>: > Hi, > >> > > I'd like to report that when upgrading our cluster from 4.14.18 to >> > > 4.14.24-rc1 (with live guests migration), almost none of guests survived.. >> > What's your hardware setup, intel with IBPB enabled microcode? >> Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz >> >> therefore I suppose no IBPB (at least meltdown checker reports so) >> >> >> > Does guests hang right after live migration? >> yes, just tried it. >> >> >> > >> > Are you able to reproduce the problem, does it work with latest upstream? >> yup, so I'm able to reproduce quickly. I'll revert the cluster to 4.14.18 now, >> but setup test system just afterwards, so and test the patch you've proposed. >> >> > >> > Not sure it helps, but following patch is missing in 4.14.24 >> > >> > commit 37b95951c58fdf08dc10afa9d02066ed9f176fb5 upstream. >> > >> > kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit >> > status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is >> > to fix it. >> > >> > Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l is set) >> > Reported-by: Jeremi Piotrowski <jeremi.piotrowski@xxxxxxxxx> >> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> >> > Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> >> > Signed-off-by: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx> >> > Signed-off-by: Radim Krčmář <rkrcmar@xxxxxxxxxx> >> >> I'll test and report. > > so indeed, this one on top of 4.14.24-rc1 fixes the migration for me. > Greg, could you queue this one up please? > > Jack, thanks for the hint! > BR > nik Hi Nik, Thanks for testing and let we know, the patch is already queued in 4.14.25-rc1 by Greg. There're some other KVM fixes and performance enhancement. Regards, Jack > > > >> >> n. >> >> >> > >> > Regards, >> > Jack >> > > >> > > I noticed that most of them got stuck in "paused" state without >> > > possibility to resume (virsh just reported guest cannot be continued and >> > > needs to be rebooted). >> > > >> > > in dmesg, lots of following messages appeared: >> > > >> > > [ 116.593508] device vnet0 entered promiscuous mode >> > > [ 124.143532] *** Guest State *** >> > > [ 124.143594] CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7 >> > > [ 124.143668] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871 >> > > [ 124.143871] CR3 = 0x00000000feffc000 >> > > [ 124.143984] RSP = 0xffffffff82003e98 RIP = 0xffffffff816df002 >> > > [ 124.144102] RFLAGS=0x00000246 DR7 = 0x0000000000000400 >> > > [ 124.144221] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 >> > > [ 124.144341] CS: sel=0xf000, attr=0x0009b, limit=0x0000ffff, base=0x00000000ffff0000 >> > > [ 124.144516] DS: sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000 >> > > [ 124.144692] SS: sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000 >> > > [ 124.144907] ES: sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000 >> > > [ 124.145089] FS: sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000 >> > > [ 124.145272] GS: sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000 >> > > [ 124.145447] GDTR: limit=0x0000ffff, base=0x0000000000000000 >> > > [ 124.145626] LDTR: sel=0x0000, attr=0x00082, limit=0x0000ffff, base=0x0000000000000000 >> > > [ 124.145814] IDTR: limit=0x0000ffff, base=0x0000000000000000 >> > > [ 124.145995] TR: sel=0x0000, attr=0x0008b, limit=0x0000ffff, base=0x0000000000000000 >> > > [ 124.146173] EFER = 0x0000000000000000 PAT = 0x0007040600070406 >> > > [ 124.146292] DebugCtl = 0x0000000000000000 DebugExceptions = 0x0000000000000000 >> > > [ 124.146466] Interruptibility = 00000000 ActivityState = 00000000 >> > > [ 124.146579] *** Host State *** >> > > [ 124.146687] RIP = 0xffffffffa046a817 RSP = 0xffffc900200a7cb8 >> > > [ 124.146832] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040 >> > > [ 124.146961] FSBase=00007fe82eff7700 GSBase=ffff881fffb40000 TRBase=fffffe00000df000 >> > > [ 124.147144] GDTBase=fffffe00000dd000 IDTBase=fffffe0000000000 >> > > [ 124.147262] CR0=0000000080050033 CR3=0000001f5b8fe004 CR4=00000000000626e0 >> > > [ 124.147381] Sysenter RSP=fffffe00000de200 CS:RIP=0010:ffffffff81801f60 >> > > [ 124.147499] EFER = 0x0000000000000d01 PAT = 0x0407050600070106 >> > > [ 124.147614] *** Control State *** >> > > [ 124.147734] PinBased=0000007f CPUBased=96a1e9fa SecondaryExec=000004f2 >> > > [ 124.147849] EntryControls=0000d1ff ExitControls=002fefff >> > > [ 124.147965] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000 >> > > [ 124.148085] VMEntry: intr_info=80000081 errcode=00000000 ilen=00000000 >> > > [ 124.148201] VMExit: intr_info=00000000 errcode=00000000 ilen=00000000 >> > > [ 124.148318] reason=80000021 qualification=0000000000000000 >> > > [ 124.148432] IDTVectoring: info=00000000 errcode=00000000 >> > > [ 124.148545] TSC Offset = 0xffed7296fb06bc34 >> > > [ 124.148655] TPR Threshold = 0x00 >> > > [ 124.148770] EPT pointer = 0x0000001f1a0af01e >> > > [ 124.148882] PLE Gap=00000080 Window=00001000 >> > > [ 124.148995] Virtual processor ID = 0x0001 >> > > >> > > (never seen anything like that) >> > > >> > > I haven't yet went through all patches between those two versions, so don't >> > > have any suspicion yet.. If anyone recognizes this as known problem, please >> > > let me know.. >> > > >> > > I'm going to try whether I'm able to reproduce the problem. >> > > >> > > BR >> > > >> > > nik >> > >> >> -- >> ------------------------------------- >> Ing. Nikola CIPRICH >> LinuxBox.cz, s.r.o. >> 28.rijna 168, 709 00 Ostrava >> >> tel.: +420 591 166 214 >> fax: +420 596 621 273 >> mobil: +420 777 093 799 >> www.linuxbox.cz >> >> mobil servis: +420 737 238 656 >> email servis: servis@xxxxxxxxxxx >> ------------------------------------- >> > > -- > ------------------------------------- > Ing. Nikola CIPRICH > LinuxBox.cz, s.r.o. > 28. rijna 168, 709 00 Ostrava > > tel.: +420 591 166 214 > fax: +420 596 621 273 > mobil: +420 777 093 799 > > www.linuxbox.cz > > mobil servis: +420 737 238 656 > email servis: servis@xxxxxxxxxxx > -------------------------------------