>>> > I agree it's either COW breaking or (similarly) locking pages that >>> > the guest hasn't touched yet. >>> > >>> > You can use prealloc or "-rt mlock=on" to avoid this problem. >>> > >>> > Paolo >> Or the new shared flag - IIRC shared VMAs don't do COW either. > >Only if the problem isn't locking and zeroing of untouched pages (also, it is not upstream is it?). > >Can you make a profile with perf? > "-rt mlock=on" option is not set, perf top -p <qemu pid> result: 21699 root 20 0 24.2g 24g 5312 S 0 33.8 0:24.39 qemu-system-x8 PerfTop: 95 irqs/sec kernel:17.9% us: 1.1% guest kernel:47.4% guest us:32.6% exact: 0.0% [1000Hz cycles], (target_pid: 15950) ---------------------------------------------------------------------------- samples pcnt function DSO _______ _____ _____________________________________ ___________ 2984.00 77.8% clear_page_c [kernel] 135.00 3.5% gup_huge_pmd [kernel] 134.00 3.5% pfn_to_dma_pte [kernel] 83.00 2.2% __domain_mapping [kernel] 63.00 1.6% update_memslots [kvm] 59.00 1.5% prep_new_page [kernel] 50.00 1.3% get_user_pages_fast [kernel] 45.00 1.2% up_read [kernel] 42.00 1.1% down_read [kernel] 38.00 1.0% gup_pud_range [kernel] 34.00 0.9% kvm_clear_async_pf_completion_queue [kvm] 18.00 0.5% intel_iommu_map [kernel] 16.00 0.4% _cond_resched [kernel] 16.00 0.4% gfn_to_hva [kvm] 15.00 0.4% kvm_set_apic_base [kvm] 15.00 0.4% load_vmcs12_host_state [kvm_intel] 14.00 0.4% clear_huge_page [kernel] 7.00 0.2% intel_iommu_iova_to_phys [kernel] 6.00 0.2% is_error_pfn [kvm] 6.00 0.2% iommu_map [kernel] 6.00 0.2% native_write_msr_safe [kernel] 5.00 0.1% find_vma [kernel] "-rt mlock=on" option is set, perf top -p <qemu pid> result: PerfTop: 326 irqs/sec kernel:17.5% us: 2.8% guest kernel:37.4% guest us:42.3% exact: 0.0% [1000Hz cycles], (target_pid: 25845) ---------------------------------------------------------------------------- samples pcnt function DSO _______ _____ _____________________________________ ___________ 182.00 17.5% pfn_to_dma_pte [kernel] 178.00 17.1% gup_huge_pmd [kernel] 91.00 8.8% __domain_mapping [kernel] 71.00 6.8% update_memslots [kvm] 65.00 6.3% gup_pud_range [kernel] 62.00 6.0% get_user_pages_fast [kernel] 52.00 5.0% kvm_clear_async_pf_completion_queue [kvm] 50.00 4.8% down_read [kernel] 37.00 3.6% up_read [kernel] 26.00 2.5% intel_iommu_map [kernel] 20.00 1.9% native_write_msr_safe [kernel] 16.00 1.5% gfn_to_hva [kvm] 14.00 1.3% load_vmcs12_host_state [kvm_intel] 8.00 0.8% find_busiest_group [kernel] 8.00 0.8% _raw_spin_lock [kernel] 8.00 0.8% hrtimer_interrupt [kernel] 8.00 0.8% intel_iommu_iova_to_phys [kernel] 7.00 0.7% iommu_map [kernel] 6.00 0.6% kvm_mmu_pte_write [kvm] 6.00 0.6% is_error_pfn [kvm] 5.00 0.5% kvm_set_apic_base [kvm] 5.00 0.5% clear_page_c [kernel] 5.00 0.5% iommu_iova_to_phys [kernel] With "-rt mlock=on" option not set, when iommu_map, many new pages have to be allocated and cleared, the clear operation is expensive. but no matter whether the "-rt mlock=on" option is set or not, the GPA->HPA DMAR page-table MUST be built, this operation is also expensive, about 1-2 sec needed for 25GB memory. Thanks, Zhang Haoyu >Paolo ?韬{.n?????%??檩??w?{.n??ぞo??n?■???h?璀?{?夸z罐?+€?zf"?????i?????_璁?:+v??撸?