On 18/12/19 17:32, Peter Xu wrote: >> With PML it is. Without PML, however, it would be much slower to >> synchronize the dirty bitmap from KVM to userspace (one atomic operation >> per page instead of one per 64 pages) and even impossible to have the >> dirty ring. > > Indeed, however I think it'll be faster for hardware to mark page as > dirty. So could it be a tradeoff on whether we want the "collection" > to be faster or "marking page dirty" to be faster? IMHO "marking page > dirty" could be even more important sometimes because that affects > guest responsiveness (blocks vcpu execution), while the collection > procedure can happen in parrallel with that. The problem is that the marking page dirty will be many many times slower, because you don't have this if (!dirty_bitmap[i]) continue; and instead you have to scan the whole of the page tables even if a handful of bits are set (reading 4K of memory for every 2M of guest RAM). This can be quite bad for the TLB too. It is certainly possible that it turns out to be faster but I would be quite surprised and, with PML, that is more or less moot. Thanks, Paolo