RE: Optimized clocksource with AMD AVIC enabled for Windows guest

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kechen Lu <kechenl@xxxxxxxxxx> writes:

> Hi Vitaly and Paolo,
>
> Sorry for the delay in response, finally got chance to access a machine with AVIC, and was able to test out the patch and reconfirm through some benchmarks and tests again today:) 
>  
> In summary, this patch works well and resolves the issues on clocksource caused high port I/O vmexits. With AVIC=1 && stimer/synic=1, 
>  
> 1.	CPU intensive workload CPU-z shows SingleThread score 15% improvement 382.1=> 441.7,    
>  
> 2.	disk I/O intensive workload Passmark Disk Test gives 4% improvement 12706=> 13265,              
>  
> 3.	Vmexits pattern of 30s record while running cpu workload Geekbench in guest showing dramatic 90.7% decrease on port IO vmexits, so as the HLT and NPF vmexits, when we get stimer benefit plus AVIC. Details as below:       
>  
> AVIC=1 && stimer/synic=0 && vapic=0:
>  
>              VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
>  
>                   io     344654    68.29%     1.10%      0.67us   2132.72us      7.01us ( +-   0.19% )
>                  hlt     114046    22.60%    98.85%      0.42us  16666.32us   1903.26us ( +-   0.66% )
> avic_incomplete_ipi      19679     3.90%     0.03%      0.38us     22.67us      3.66us ( +-   0.71% )
>                  npf       8186     1.62%     0.01%      0.37us    235.76us      1.46us ( +-   4.20% )
>             ........                      
>
>  
> AVIC=1 && stimer/synic=1 && vapic=0:
>  
>              VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
>  
>                   io      31995    38.61%     0.10%      2.79us     65.83us      6.70us ( +-   0.35% )
>                  hlt      22915    27.65%    99.88%      0.42us  15959.14us   9535.38us ( +-   0.50% )
> avic_incomplete_ipi       8271     9.98%     0.01%      0.39us     79.03us      3.58us ( +-   1.23% )
>                  npf       1232     1.49%     0.00%      0.36us    100.25us      2.58us ( +-   6.98% )
> 	..........                                                                                                                                           
>
> While testing, I also found out hv-vapic should be disabled as well to
> make AVIC fully functional, otherwise it shows high vmexits due to MSR
> writes which seems to be due to  increased access to HV_X64_MSR_EOI
> and HV_X64_MSR_ICR. This makes sense to me, since AVIC conflicts with
> PV EOI/ICR accesses. So far I think AVIC=1 && hv-vapic=0 &&
> stimer/synic=1 combination gives us the best performance. However,
> AVIC=1 && hv-vapic=0 && stimer/synic=1 is really unstable, and
> sometimes would lead to boot. Wanted to understand if instabilities
> with APICv/AVIC is a known bug/issue in upstream? Attached the
> reproducible kernel warning in the bottom.

Now it's my turn to apologize for the delayed reply :-)

I think it's our fault,

BIT(3) in HYPERV_CPUID_ENLIGHTMENT_INFO is

HV_X64_APIC_ACCESS_RECOMMENDED
which can be deciphered as 

"Recommend using MSRs for accessing APIC registers EOI, ICR and TPR
rather than their memory-mapped counterparts"

And we shouldn't be setting it with AVIC. The following hack is supposed
to help:

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c8f2592ccc99..66ee85a83e9a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -145,6 +145,13 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
                                           vcpu->arch.ia32_misc_enable_msr &
                                           MSR_IA32_MISC_ENABLE_MWAIT);
        }
+
+       /* Dirty hack: force HV_DEPRECATING_AEOI_RECOMMENDED. Not to be merged! */
+       best = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO, 0);
+       if (best) {
+               best->eax &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+               best->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
+       }
 }
 EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);

(we'll need to find a proper way to set these settings in QEMU).
 
Could you give it a spin? ("AVIC=1 && hv-vapic=1 && stimer/synic=1" configuration)

>  
> In all, AVIC=1 && hv-vapic=1 && stimer/synic=1 could work stably now and still produce great benefits on vmexits optimization. Thanks all you folks help so much, hope the patch in kernel and bit expose patch in QEMU could get into upstream soon along with fixing the instabilities.
>  
> Best Regards,
> Kechen
>
> ---------------------------------------------------------------------------------------
> [ 7962.437584] ------------[ cut here ]------------
> [ 7962.437586] Invalid IPI target: index=2, vcpu=0, icr=0x4000000:0x82f
> [ 7962.437603] WARNING: CPU: 4 PID: 7109 at arch/x86/kvm/svm/avic.c:349 avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]
> [ 7962.437604] Modules linked in: kvm_amd ccp kvm msr nf_tables nfnetlink bridge stp llc amd64_edac_mod edac_mce_amd nls_iso8859_1 amd_energy crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper snd_hda_codec_hdmi rapl snd_hda_intel snd_intel_dspcfg wmi_bmof snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep snd_seq_midi snd_seq_midi_event snd_rawmidi efi_pstore joydev mc input_leds snd_seq snd_pcm snd_seq_device snd_timer snd soundcore k10temp mac_hid sch_fq_codel lm92 parport_pc ppdev lp parport ip_tables x_tables autofs4 iavf hid_generic usbhid hid nvme crc32_pclmul i40e ahci nvme_core xhci_pci libahci xhci_pci_renesas i2c_piix4 atlantic macsec wmi [last unloaded: ccp]
> [ 7962.437630] CPU: 4 PID: 7109 Comm: CPU 0/KVM Tainted: P        W  OE     5.8.0-41-generic #46
> [ 7962.437633] RIP: 0010:avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd]

No, this is not somthing I'm aware of. Do you know if it reproduces on
the latest upstream?

> [ 7962.437635] Code: 9a 00 00 00 0f 85 2b ff ff ff 41 8b 56 24 8b 4d c8 45 89 e0 44 89 ee 48 c7 c7 a8 34 50 c0 c6 05 b2 9a 00 00 01 e8 d6 cc 3a fb <0f> 0b e9 04 ff ff ff 48 8b 5d c0 8b 55 c8 be 10 03 00 00 48 89 df
> [ 7962.437636] RSP: 0018:ffffa7894f9bfcc0 EFLAGS: 00010282
> [ 7962.437637] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff99347f118cd8
> [ 7962.437637] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff99347f118cd0
> [ 7962.437638] RBP: ffffa7894f9bfd18 R08: 0000000000000004 R09: 0000000000000831
> [ 7962.437638] R10: 0000000000000000 R11: 0000000000000001 R12: 040000000000082f
> [ 7962.437639] R13: 0000000000000002 R14: ffff993345653448 R15: 0000000000000002
> [ 7962.437640] FS:  0000000000000000(0053) GS:ffff99347f100000(002b) knlGS:fffff80470728000
> [ 7962.437640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7962.437641] CR2: ffff8006ace2b000 CR3: 0000000febd88000 CR4: 0000000000340ee0
> [ 7962.437641] Call Trace:
> [ 7962.437646]  handle_exit+0x134/0x420 [kvm_amd]
> [ 7962.437661]  ? kvm_set_cr8+0x22/0x40 [kvm]
> [ 7962.437674]  vcpu_enter_guest+0x862/0xd90 [kvm]
> [ 7962.437687]  vcpu_run+0x76/0x240 [kvm]
> [ 7962.437699]  kvm_arch_vcpu_ioctl_run+0x9f/0x2b0 [kvm]
> [ 7962.437711]  kvm_vcpu_ioctl+0x247/0x600 [kvm]
> [ 7962.437714]  ksys_ioctl+0x8e/0xc0
> [ 7962.437715]  __x64_sys_ioctl+0x1a/0x20
> [ 7962.437717]  do_syscall_64+0x49/0xc0
> [ 7962.437719]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 7962.437720] RIP: 0033:0x7f4c09b1131b
> [ 7962.437721] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1d 3b 0d 00 f7 d8 64 89 01 48
> [ 7962.437721] RSP: 002b:00007f4bedffa4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 7962.437722] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4c09b1131b
> [ 7962.437723] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
> [ 7962.437723] RBP: 0000563c35a94990 R08: 0000563c33b95a30 R09: 0000000000000004
> [ 7962.437724] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [ 7962.437724] R13: 0000563c34196d00 R14: 0000000000000000 R15: 00007f4bedffb640
> [ 7962.437726] ---[ end trace 7f0f339c3a001d7b ]---
>

-- 
Vitaly




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux