Re: [PATCH 1/2] migration: Implement dirty ring

Shota Imamura <cosocaf@xxxxxxxxx> · Tue, 25 Jun 2024 20:10:23 +0900

Dear Peter Xu,

Thank you for your feedback.

> It looks like this patch will introduce a ring but still it keeps the
> bitmaps around.
>
> Could you elaborate your motivation of this work? It’ll be interesting to
> know whether you did any kind of measurement around it.

First of all, I apologize for the lack of explanation.
To provide more details, the motivation for this work stems from the
current QEMU implementation, where pages obtained from the KVM ring are set
into the KVMSlot/RAMList/RAMBlock bitmaps. Consequently, when the migration
thread sends data, it ends up scanning the bitmap (resulting in O(N) time
complexity). I aimed to improve this situation.

Here are the steps and considerations in my implementation plan:

1. Call MigrationOps::ram_save_target_page inside kvm_dirty_ring_reap_one.

The approach involves QEMU holding neither bitmaps nor rings and sending
immediately. However, this would require non-migration threads (like accel
threads) to send pages, necessitating a synchronization mechanism with the
migration thread, which I deemed would increase code complexity.
Additionally, if future non-KVM accels provided their rings, we would have
to write similar code in different places, increasing future maintenance
costs. For these reasons, I decided against an implementation where page
sending occurs within accel/kvm and opted for a separate ring within QEMU.

2. Use a ring as an alternative to bitmaps.

The approach involves implementing a ring within QEMU and inserting pages
into the ring instead of setting them into bitmaps in functions like
kvm_dirty_ring_mark_page, cpu_physical_memory_set_dirty_lebitmap, and
cpu_physical_memory_set_dirty_range. Then, pages are retrieved from the
ring in ram_find_and_save_block. However, this approach necessitates
immediate sending of pages when the ring is full, which might involve
non-migration threads sending pages, leading to the same issues as
mentioned in step 1. Given the ring has a limited capacity, if there are
enough dirty pages to fill the ring, the cost difference between operating
the ring and scanning the entire bitmap would be negligible.
Hence, I decided to fall back to bitmaps when the ring is full.

3. Use bitmaps when the ring is full.

The approach involves setting bitmaps while simultaneously inserting pages
into the ring in functions like kvm_dirty_ring_mark_page,
cpu_physical_memory_set_dirty_lebitmap, and
cpu_physical_memory_set_dirty_range. If the ring is not full, it is used in
ram_find_and_save_block; otherwise, bitmaps are used. This way, only the
migration thread sends pages. Additionally, by checking if a page is
already in the ring (O(1) complexity), redundant entries are avoided.
However, enqueuing and dequeuing are handled by different threads, which
could result in a situation where pages exist in the bitmap but not in the
ring once it is full. Identifying these pages would require locking and
scanning the entire bitmap.

4. Use two rings to revert to the ring after falling back to the bitmap
within a round.

As mentioned earlier, once the fallback to the bitmap occurs, pages that
get dirty after the ring is full cannot be captured by the ring. This would
necessitate using bitmaps until the final round or periodically locking and
scanning the entire bitmap to synchronize with the ring. To improve this, I
considered using two rings: one for enqueueing and one for dequeuing. Pages
are inserted into the enqueue ring in functions like
kvm_dirty_ring_mark_page and cpu_physical_memory_set_dirty_range, and the
rings are swapped in migration_sync_bitmap, with pages being retrieved in
ram_find_and_save_block. This way, each call to migration_sync_bitmap (or
ram_state_pending_exact) determines whether to use the ring or the bitmap
in subsequent rounds.

Based on this reasoning, I implemented a system that combines bitmaps and
rings.

Regarding performance, my local environment might be insufficient for
proper measurement, but I obtained the following results by migrating after
booting the latest Linux and Buildroot with an open login shell:

Commands used:
```
# src
sudo ./qemu-system-x86_64 \
-accel kvm,dirty-ring-size=1024 \
-m 8G \
-boot c \
-kernel ~/path/to/linux/arch/x86/boot/bzImage \
-hda ~/path/to/buildroot/output/images/rootfs.ext4 \
-append "root=/dev/sda rw console=ttyS0,115200 acpi=off" \
-nographic \
-migration dirty-logging=ring,dirty-ring-size=1024

# dst
sudo ./qemu-system-x86_64 \
-accel kvm,dirty-ring-size=1024 \
-m 8G \
-boot c \
-kernel ~/path/to/linux/arch/x86/boot/bzImage \
-hda ~/path/to/buildroot/output/images/rootfs.ext4 \
-append "root=/dev/sda rw console=ttyS0,115200 acpi=off" \
-nographic \
-incoming tcp:0:4444

# hmp
migrate_set_parameter max-bandwidth 1250
migrate tcp:0:4444
info migrate
```

Results for each memory size, measured 5 times:
```
# ring -m 8G
total time: 418 ms
total time: 416 ms
total time: 415 ms
total time: 416 ms
total time: 416 ms

# bitmap -m 8G
total time: 434 ms
total time: 421 ms
total time: 423 ms
total time: 430 ms
total time: 429 ms

# ring -m 16G
total time: 847 ms
total time: 852 ms
total time: 850 ms
total time: 848 ms
total time: 852 ms

# bitmap -m 16G
total time: 860 ms
total time: 862 ms
total time: 858 ms
total time: 859 ms
total time: 861 ms

# ring -m 32G
total time: 1616 ms
total time: 1625 ms
total time: 1612 ms
total time: 1612 ms
total time: 1630 ms

# bitmap -m 32G
total time: 1714 ms
total time: 1724 ms
total time: 1718 ms
total time: 1714 ms
total time: 1714 ms

# ring -m 64G
total time: 3451 ms
total time: 3452 ms
total time: 3449 ms
total time: 3451 ms
total time: 3450 ms

# bitmap -m 64G
total time: 3550 ms
total time: 3553 ms
total time: 3552 ms
total time: 3550 ms
total time: 3553 ms

# ring -m 96G
total time: 5185 ms
total time: 5186 ms
total time: 5183 ms
total time: 5191 ms
total time: 5191 ms

# bitmap -m 96G
total time: 5385 ms
total time: 5388 ms
total time: 5386 ms
total time: 5392 ms
total time: 5592 ms
```

It is natural that the implemented ring completes migration faster for all
memory sizes, given that the conditions favor the ring due to the minimal
memory workload. By the way, these are total migration times, with much of
the overhead attributed to page sending and other IO operations.

I plan to conduct more detailed measurements going forward, but if you have
any recommendations for good measurement methods, please let me know.

> I remember adding such option is not suggested. We may consider using
> either QMP to setup a migration parameter, or something else.

I apologize for implementing this without knowledge of QEMU's policy.
I will remove this option and instead implement it using
migrate_set_parameter or migrate_set_capability. Is this approach
acceptable?

This is my first time contributing to QEMU, so I appreciate your guidance.

Thank you and best regards,

--
Shota Imamura

2024年6月25日(火) 4:08 Peter Xu <peterx@xxxxxxxxxx>:

>
> Hi, Shota,
>
> On Thu, Jun 20, 2024 at 06:47:13PM +0900, Shota Imamura wrote:
> > This commit implements the dirty ring as an alternative dirty tracking
> > method to the dirty bitmap.
> >
> > While the dirty ring has already been implemented in accel/kvm using KVM's
> > dirty ring, it was designed to set bits in the ramlist and ramblock bitmap.
> > This commit introduces a new dirty ring to replace the bitmap, allowing the
> > use of the dirty ring even without KVM. When using KVM's dirty ring, this
> > implementation maximizes its effectiveness.
>
> It looks like this patch will introduce a ring but still it keeps the
> bitmaps around.
>
> Could you elaborate your motivation of this work?  It'll be interesting to
> know whether you did any kind of measurement around it.
>
> >
> > To enable the dirty ring, specify the startup option
> > "-migration dirty-logging=ring,dirty-ring-size=N". To use the bitmap,
> > either specify nothing or "-migration dirty-logging=bitmap". If the dirty
> > ring becomes full, it falls back to the bitmap for that round.
>
> I remember adding such option is not suggested.  We may consider using
> either QMP to setup a migration parameter, or something else.
>
> Thanks,
>
> --
> Peter Xu
>