This patch series is the result of the integration of my dirty logging optimization work, including preparation for the new GET_DIRTY_LOG API, and the attempt to get rid of controversial synchronize_srcu_expedited(). 1 - KVM: MMU: Split the main body of rmap_write_protect() off from others 2 - KVM: Avoid checking huge page mappings in get_dirty_log() 3 - KVM: Switch to srcu-less get_dirty_log() 4 - KVM: Remove unused dirty_bitmap_head and nr_dirty_pages Although there are still some remaining tasks, the test result obtained looks very promising. Remaining tasks: - Implement set_bit_le() for mark_page_dirty() Some drivers are using their own implementation of it and a bit of work is needed to make it generic. I want to do this separately later because it cannot be done within kvm tree. - Stop allocating extra dirty bitmap buffer area According to Peter, mmu_notifier has become preemptible. If we can change mmu_lock from spin_lock to mutex_lock, as Avi said before, this would be staightforward because we can use __put_user() right after xchg() with the mmu_lock held. Test results: 1. dirty-log-perf unit test (on Sandy Bridge core-i3 32-bit host) With some changes added since the previous post, the performance was much improved: now even when every page in the slot is dirty, the number is reasonably close to the original one. For others, needless to say, we have achieved very nice improvement. - kvm.git next average(ns) stdev ns/page pages 147018.6 77604.9 147018.6 1 158080.2 82211.9 79040.1 2 127555.6 80619.8 31888.9 4 108865.6 78499.3 13608.2 8 114707.8 43508.6 7169.2 16 76679.0 37659.8 2396.2 32 59159.8 20417.1 924.3 64 60418.2 19405.7 472.0 128 76267.0 21450.5 297.9 256 113182.0 22684.9 221.0 512 930344.2 153766.5 908.5 1K 939098.2 163800.3 458.5 2K 996813.4 77921.0 243.3 4K 1113232.6 107782.6 135.8 8K 1241206.4 82282.5 75.7 16K 1529526.4 116388.2 46.6 32K 2147538.4 227375.9 32.7 64K 3309619.4 79356.8 25.2 128K 6016951.8 549873.4 22.9 256K - kvm.git next + srcu-less series average(ns) stdev ns/page pages improvement(%) 14086.0 3532.3 14086.0 1 944 13303.6 3317.7 6651.8 2 1088 13455.6 3315.2 3363.9 4 848 14125.8 3435.4 1765.7 8 671 15322.4 3690.1 957.6 16 649 17026.6 4037.2 532.0 32 350 21258.6 4852.3 332.1 64 178 33845.6 14115.8 264.4 128 79 37893.0 681.8 148.0 256 101 61707.4 1057.6 120.5 512 83 88861.4 2131.0 86.7 1K 947 151315.6 6490.5 73.8 2K 521 290579.6 8523.0 70.9 4K 243 518231.0 20412.6 63.2 8K 115 2271171.4 12064.9 138.6 16K -45 3375866.2 14743.3 103.0 32K -55 4408395.6 10720.0 67.2 64K -51 5915336.2 26538.1 45.1 128K -44 8497356.4 16441.0 32.4 256K -29 Note that when the number of dirty pages was large, we spent less than 100ns for getting one dirty page information: see ns/page column. As Avi noted before, this is much faster than the userspace send one page to the destination node. Furthermore, with the already proposed new GET_DIRTY_LOG API, we will be able to restrict the area from which we get the log and will not need to care about ms order of latency observed for very large number of dirty pages. 2. real workloads (on Xeon W3520 64-bit host) I traced kvm_vm_ioctl_get_dirty_log() during heavy VGA updates and during live migration. 2.1. VGA: guest was doing "x11perf -rect1 -rect10 -rect100 -rect500" As can be guessed from the result of dirty-log-perf, we observed very nice improvement. - kvm.git next For heavy updates: 100us to 300us. Worst: 300us - kvm.git next + srcu-less series For heavy updates: 3us to 10us. Worst: 50us. 2.2. live migration: guest was doing "dd if=/path/to/a/file of=/dev/null" The improvement was significant again. - kvm.git next For heavy updates: 1ms to 3ms - kvm.git next + srcu-less series For heavy updates: 50us to 300us Probably we gained a lot from the locality of WWS. Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html