Hi Avi, Thanks for viewing! On Wednesday 22 June 2011 18:43:30 Avi Kivity wrote: > On 06/21/2011 04:32 PM, Nai Xia wrote: > > Introduced kvm_mmu_notifier_test_and_clear_dirty(), kvm_mmu_notifier_dirty_update() > > and their mmu_notifier interfaces to support KSM dirty bit tracking, which brings > > significant performance gain in volatile pages scanning in KSM. > > Currently, kvm_mmu_notifier_dirty_update() returns 0 if and only if intel EPT is > > enabled to indicate that the dirty bits of underlying sptes are not updated by > > hardware. > > > > > Can you quantify the performance gains? Compared with checksum based approach, the speed up for volatile host working set is about 8 times on normal pages, 16 times on transhuge page. I have not collect the figures in guest os yet. I'll be back with these numbers in guest. > > > +int kvm_test_and_clear_dirty_rmapp(struct kvm *kvm, unsigned long *rmapp, > > + unsigned long data) > > +{ > > + u64 *spte; > > + int dirty = 0; > > + > > + if (!shadow_dirty_mask) { > > + WARN(1, "KVM: do NOT try to test dirty bit in EPT\n"); > > + goto out; > > + } > > + > > + spte = rmap_next(kvm, rmapp, NULL); > > + while (spte) { > > + int _dirty; > > + u64 _spte = *spte; > > + BUG_ON(!(_spte& PT_PRESENT_MASK)); > > + _dirty = _spte& PT_DIRTY_MASK; > > + if (_dirty) { > > + dirty = 1; > > + clear_bit(PT_DIRTY_SHIFT, (unsigned long *)spte); > > + } > > Racy. Also, needs a tlb flush eventually. > > > + spte = rmap_next(kvm, rmapp, spte); > > + } > > +out: > > + return dirty; > > +} > > + > > #define RMAP_RECYCLE_THRESHOLD 1000 > > > > > > struct mmu_notifier_ops { > > + int (*dirty_update)(struct mmu_notifier *mn, > > + struct mm_struct *mm); > > + > > I prefer to have test_and_clear_dirty() always return 1 in this case (if > the spte is writeable), and drop this callback. If test_and_clear_dirty() always return 1, how can ksmd tell if it's a real dirty page or just casued by EPT and ksmd should just fallback to checksum based approach? > > +int __mmu_notifier_dirty_update(struct mm_struct *mm) > > +{ > > + struct mmu_notifier *mn; > > + struct hlist_node *n; > > + int dirty_update = 0; > > + > > + rcu_read_lock(); > > + hlist_for_each_entry_rcu(mn, n,&mm->mmu_notifier_mm->list, hlist) { > > + if (mn->ops->dirty_update) > > + dirty_update |= mn->ops->dirty_update(mn, mm); > > + } > > + rcu_read_unlock(); > > + > > Should it not be &= instead? I think the logic is "if _any_ underlying MMU is going to update the bit, then this bit is not dead, we can query it throught test_and_clear....". ksmd should not care about which one dirties the page, as long as it's dirty, it can be skipped. Did I miss sth? Thanks, Nai > > > + return dirty_update; > > +} > > + > > /* > > * This function can't run concurrently against mmu_notifier_register > > * because mm->mm_users> 0 during mmu_notifier_register and exit_mmap > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>