Takuya Yoshikawa <yoshikawa.takuya@xxxxxxxxxxxxx> wrote: > On Tue, 30 Nov 2010 16:27:13 +0200 > Avi Kivity <avi@xxxxxxxxxx> wrote: > Does anyone is profiling these dirty bitmap things? I am. > - 512GB guest is really the target? no, problems exist with smaller amounts of RAM. with 16GB guest it is trivial to get 1s stalls, 64GB guest, 3-4s, with more memory, migration is flaky to say the less. > - how much cpu time can we use for these things? the problem here is that we are forced to walk the bitmap too many times, we want to do it less times. > - how many dirty pages do we have to care? default values and assuming 1Gigabit ethernet for ourselves ~9.5MB of dirty pages to have only 30ms of downtime. But notice that this is what we are advertising, we aren't near there at all. > Since we are planning to do some profiling for these, taking into account > Kemari, can you please share these information? If you see the 0/10 email with this setup, you can see how much time are we spending on stuff. Just now (for migration, kemari is a bit different) we have to fix other things first. Next item for me is te improve that bitmap handling (we can at least trivial divide the space used by 8, and use ffs to find dirty pages). Thinknig about changing kvm interface when we convert it to the bottleneck (it is not at this point). >> >>> In the short term, fixing (2) by accounting zero pages as full sized >> >>> pages should "fix" the problem. >> >>> >> >>> In the long term, we need a new dirty bit interface from kvm.ko that >> >>> uses a multi-level table. That should dramatically improve scan >> >>> performance. >> >> >> >> Why would a multi-level table help? (or rather, please explain what >> >> you mean by a multi-level table). >> >> >> >> Something we could do is divide memory into more slots, and polling >> >> each slot when we start to scan its page range. That reduces the >> >> time between sampling a page's dirtiness and sending it off, and >> >> reduces the latency incurred by the sampling. There are also > > If we use rmap approach with one more interface, we can specify which > range of dirty bitmap to get. This has the same effect to splitting into > more slots. kvm allows us to that today. It is qemu who don't use this information. qemu always asks for the whole memory. kvm is happy to give only a range. >> >> non-interface-changing ways to reduce this latency, like O(1) write >> >> protection, or using dirty bits instead of write protection when >> >> available. > > IIUC, O(1) will lazily write protect pages beggining from top level? > Does this have any impact other than the timing of get_dirty_log()? dunno. At this point I am trying to: - get migration with 16-64GB to not having stalls. - get infrastructure to be able to know what is going on. So far, bigger stalls are gone, and discusing what to do next. As Anthony suggested I run ram_save_live() loop without qemu_mutex, and now guests get much better interaction, but my current patch (for this) is just to put qemu_mutex_unlock_iothread()/qemu_mutex_lock_iothread() around it. I think that we are racey with teh access to the bitmap, but it was just a test. With respect to Kemari, we can discuss what do you need and how you are going to test, just to not do overlapping work. Thanks, Juan. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html