Hi Peter, On Sun, 06 Nov 2022 16:22:29 +0000, Peter Xu <peterx@xxxxxxxxxx> wrote: > > Hi, Marc, > > On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote: > > > +Note that the bitmap here is only a backup of the ring structure, and > > > +normally should only contain a very small amount of dirty pages, which > > > > I don't think we can claim this. It is whatever amount of memory is > > dirtied outside of a vcpu context, and we shouldn't make any claim > > regarding the number of dirty pages. > > The thing is the current with-bitmap design assumes that the two logs are > collected in different windows of migration, while the dirty log is only > collected after the VM is stopped. So collecting dirty bitmap and sending > the dirty pages within the bitmap will be part of the VM downtime. > > It will stop to make sense if the dirty bitmap can contain a large portion > of the guest memory, because then it'll be simpler to just stop the VM, > transfer pages, and restart on dest node without any tracking mechanism. Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense in general. It only makes sense if the source of the dirty pages is limited to the vcpus, which is literally a corner case. Look at any real machine, and you'll quickly realise that this isn't the case, and that DMA *is* a huge source of dirty pages. Here, we're just lucky enough not to have much DMA tracking yet. Once that happens (and I have it from people doing the actual work that it *is* happening), you'll realise that the dirty ring story is of very limited use. So I'd rather drop anything quantitative here, as this is likely to be wrong. > > [1] > > > > > > +needs to be transferred during VM downtime. Collecting the dirty bitmap > > > +should be the very last thing that the VMM does before transmitting state > > > +to the target VM. VMM needs to ensure that the dirty state is final and > > > +avoid missing dirty pages from another ioctl ordered after the bitmap > > > +collection. > > > + > > > +To collect dirty bits in the backup bitmap, the userspace can use the > > > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed > > > +and its behavior is undefined since collecting the dirty bitmap always > > > +happens in the last phase of VM's migration. > > > > It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If > > you have multiple devices that dirty the memory, such as multiple > > ITSs, why shouldn't userspace be allowed to snapshot the dirty state > > multiple time? This doesn't seem like a reasonable restriction, and I > > really dislike the idea of undefined behaviour here. > > I suggested the paragraph because it's very natural to ask whether we'd > need to CLEAR_LOG for this special GET_LOG phase, so I thought this could > be helpful as a reference to answer that. > > I wanted to make it clear that we don't need CLEAR_LOG at all in this case, > as fundamentally clear log is about re-protect the guest pages, but if > we're with the restriction of above (having the dirty bmap the last to > collect and once and for all) then it'll make no sense to protect the guest > page at all at this stage since src host shouldn't run after the GET_LOG > then the CLEAR_LOG will be a vain effort. That's not for you to decide, but userspace. I can perfectly expect userspace saving an ITS, getting the bitmap, saving the pages and then *clearing the log* before processing the next ITS. Or anything else. And by the way, userspace is perfectly entitled to *restart* the VM on the spot if it wants to. After all, there is absolutely nothing that says "migration" here. You are reading it all over the place, but that's not what the API is about. Frankly, I don't see why we should put random limitation to this API. We're not in the business of setting policies on what userspace does. > > I used "undefined" here just to be loose on the interface, also a hint that > we should never do that for this specific GET_LOG. If we want, we can > ignore CLEAR_LOG in the future with ALLOW_BITMAP, and the undefined also > provides the flexibility, but that's not really that important. > > The wording could definitely be improved, or maybe even avoid mentioning > the CLEAR_LOG might help, but IIUC the major thing to reach the consensus > is not CLEAR_LOG itself but on whether we can have that assumption [1] and > whether such a design of using dirty bmap is acceptable in general. I don't know what [1] is, but the bitmap should behave correctly, no matter what userspace does, and provide consistent results. We already depend on this. If the current API cannot support this correctly, then we should fix it before I take this series. M. -- Without deviation from the norm, progress is not possible.