Hi Peter and Marc,
On 11/7/22 5:06 AM, Peter Xu wrote:
On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
On Sun, 06 Nov 2022 16:22:29 +0000,
Peter Xu <peterx@xxxxxxxxxx> wrote:
On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
+Note that the bitmap here is only a backup of the ring structure, and
+normally should only contain a very small amount of dirty pages, which
I don't think we can claim this. It is whatever amount of memory is
dirtied outside of a vcpu context, and we shouldn't make any claim
regarding the number of dirty pages.
The thing is the current with-bitmap design assumes that the two logs are
collected in different windows of migration, while the dirty log is only
collected after the VM is stopped. So collecting dirty bitmap and sending
the dirty pages within the bitmap will be part of the VM downtime.
It will stop to make sense if the dirty bitmap can contain a large portion
of the guest memory, because then it'll be simpler to just stop the VM,
transfer pages, and restart on dest node without any tracking mechanism.
Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
in general. It only makes sense if the source of the dirty pages is
limited to the vcpus, which is literally a corner case. Look at any
real machine, and you'll quickly realise that this isn't the case, and
that DMA *is* a huge source of dirty pages.
Here, we're just lucky enough not to have much DMA tracking yet. Once
that happens (and I have it from people doing the actual work that it
*is* happening), you'll realise that the dirty ring story is of very
limited use. So I'd rather drop anything quantitative here, as this is
likely to be wrong.
Is it a must that arm64 needs to track device DMAs using the same dirty
tracking interface rather than VFIO or any other interface? It's
definitely not the case for x86, but if it's true for arm64, then could the
DMA be spread across all the guest pages? If it's also true, I really
don't know how this will work..
We're only syncing the dirty bitmap once right now with the protocol. If
that can cover most of the guest mem, it's same as non-live. If we sync it
periodically, then it's the same as enabling dirty-log alone and the rings
are useless.
For vgic/its tables, the number of dirty pages can be huge in theory. However,
they're limited in practice. So I intend to agree with Peter that dirty-ring
should be avoided and dirty-log needs to be used instead when the DMA case
is supported in future. As Peter said, the small amount of dirty pages in
the bitmap is the condition to use it here. I think it makes sense to mention
it in the document.
[1]
+needs to be transferred during VM downtime. Collecting the dirty bitmap
+should be the very last thing that the VMM does before transmitting state
+to the target VM. VMM needs to ensure that the dirty state is final and
+avoid missing dirty pages from another ioctl ordered after the bitmap
+collection.
+
+To collect dirty bits in the backup bitmap, the userspace can use the
+same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
+and its behavior is undefined since collecting the dirty bitmap always
+happens in the last phase of VM's migration.
It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
you have multiple devices that dirty the memory, such as multiple
ITSs, why shouldn't userspace be allowed to snapshot the dirty state
multiple time? This doesn't seem like a reasonable restriction, and I
really dislike the idea of undefined behaviour here.
I suggested the paragraph because it's very natural to ask whether we'd
need to CLEAR_LOG for this special GET_LOG phase, so I thought this could
be helpful as a reference to answer that.
I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
as fundamentally clear log is about re-protect the guest pages, but if
we're with the restriction of above (having the dirty bmap the last to
collect and once and for all) then it'll make no sense to protect the guest
page at all at this stage since src host shouldn't run after the GET_LOG
then the CLEAR_LOG will be a vain effort.
That's not for you to decide, but userspace. I can perfectly expect
userspace saving an ITS, getting the bitmap, saving the pages and then
*clearing the log* before processing the next ITS. Or anything else.
I think I can get your point on why you're not happy with the document, but
IMHO how we document is one thing, how it'll work is another. I preferred
explicit documentation because it'll help the app developer to support the
interface, also more docs to reference in the future; no strong opinion,
though.
However if there's fundamental statement that was literally wrong, then
it's another thing, and we may need to rethink.
How about to avoid mentioning KVM_CLEAR_DIRTY_LOG here? I don't expect QEMU
to clear the dirty bitmap after it's collected in this particular case.
Thanks,
Gavin