On Thu, Jan 09, 2020 at 12:58:08PM -0500, Peter Xu wrote: > On Thu, Jan 09, 2020 at 09:47:11AM -0700, Alex Williamson wrote: > > On Thu, 9 Jan 2020 09:57:08 -0500 > > Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > > Branch is here: https://github.com/xzpeter/linux/tree/kvm-dirty-ring > > > (based on kvm/queue) > > > > > > Please refer to either the previous cover letters, or documentation > > > update in patch 12 for the big picture. Previous posts: > > > > > > V1: https://lore.kernel.org/kvm/20191129213505.18472-1-peterx@xxxxxxxxxx > > > V2: https://lore.kernel.org/kvm/20191221014938.58831-1-peterx@xxxxxxxxxx > > > > > > The major change in V3 is that we dropped the whole waitqueue and the > > > global lock. With that, we have clean per-vcpu ring and no default > > > ring any more. The two kvmgt refactoring patches were also included > > > to show the dependency of the works. > > > > Hi Peter, > > Hi, Alex, > > > > > Would you recommend this style of interface for vfio dirty page > > tracking as well? This mechanism seems very tuned to sparse page > > dirtying, how well does it handle fully dirty, or even significantly > > dirty regions? > > That's truely the point why I think the dirty bitmap can still be used > and should be kept. IIUC the dirty ring starts from COLO where (1) > dirty rate is very low, and (2) sync happens frequently. That's a > perfect ground for dirty ring. However it for sure does not mean that > dirty ring can solve all the issues. As you said, I believe the full > dirty is another extreme in that dirty bitmap could perform better. > > > We also don't really have "active" dirty page tracking > > in vfio, we simply assume that if a page is pinned or otherwise mapped > > that it's dirty, so I think we'd constantly be trying to re-populate > > the dirty ring with pages that we've seen the user consume, which > > doesn't seem like a good fit versus a bitmap solution. Thanks, > > Right, so I confess I don't know whether dirty ring is the ideal > solutioon for vfio either. Actually if we're tracking by page maps or > pinnings, then IMHO it also means that it could be more suitable to > use an modified version of dirty ring buffer (as you suggested in the > other thread), in that we can track dirty using (addr, len) range > rather than a single page address. That could be hard for KVM because > in KVM the page will be mostly trapped in 4K granularity in page > faults, and it'll also be hard to merge continuous entries with > previous ones because the userspace could be reading the entries (so > after we publish the previous 4K dirty page, we should not modify the > entry any more). An easy way would be to keep a couple of entries around, not pushing them into the ring until later. In fact deferring queue write until there's a bunch of data to be pushed is a very handy optimization. When building UAPI's it makes sense to try and keep them generic rather than tying them to a given implementation. That's one of the reasons I called for using something resembling vring_packed_desc. > VFIO should not have this restriction because the > marking of dirty page range can be atomic when the range of pages are > mapped or pinned. > > Thanks, > > -- > Peter Xu