Hi, Ben, On Thu, Feb 04, 2021 at 02:19:59PM -0800, Ben Gardon wrote: > The average time for each run demonstrated a strange bimodal distribution, > with clusters around 2 seconds and 2.5 seconds. This may have been a > result of vCPU migration between NUMA nodes. Have you thought about using numactl or similar technique to verify your idea (force both vcpu threads binding, and memory allocations)? >From the numbers it already shows improvements indeed, but just curious since you raised this up. :) > @@ -5707,13 +5708,18 @@ static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu) > else > pml_idx++; > > + memslots = kvm_vcpu_memslots(vcpu); > + > pml_buf = page_address(vmx->pml_pg); > for (; pml_idx < PML_ENTITY_NUM; pml_idx++) { > + struct kvm_memory_slot *memslot; > u64 gpa; > > gpa = pml_buf[pml_idx]; > WARN_ON(gpa & (PAGE_SIZE - 1)); > - kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT); > + > + memslot = __gfn_to_memslot(memslots, gpa >> PAGE_SHIFT); > + mark_page_dirty_in_slot(vcpu->kvm, memslot, gpa >> PAGE_SHIFT); Since at it: make "gpa >> PAGE_SHIFT" a temp var too? Thanks, -- Peter Xu