On Tue, Oct 30, 2012 at 06:53:31PM +0000, Benoit Hudzia wrote: > Hi Isaku, > > > Are you going to be at the KVM forum ( i think you have a presentation there). > It would be nice if we could meet in order to see if we can synch our efforts . Yes, definitively. > As you know we have been developing an RDMA based solution for post copy > migration and we demonstrated the initial proof of concept in december 2012 ( > we published some finding in VHPC 2012 and are working with Petter Svard from > Umea on a journal paper with more detailed performance review) . Do you have any pointers to available papers/slides? I can't find any at http://vhpc.org/ > While RDMA post copy live migration is just of by product of our long term > effort ( i will present the project in my talk at KVM forum) we grabbed the > opportunity to address problems we were facing with the live migration of > enterprise workload . Namely how to migrate in memory database such has HANA > under load. > > We quickly discovered that pre copy ( even with optimization ) didn't work with > such workload. We also tried your code however the performance where far from > satisfying with large VM under load due to the heavy cost of transferring > memory between user space - kernel multiple time ( actually it often failed) If possible, I'd like to see the details. > We then tested a pure RDMA solution we developed ( we suport HW and software > RDMA ) and it work fine with all the workload we tested ( we migrated VM > with 20+ GB running SAP HANA under a workload similar to TPC-H) and we hop to > test with bigger configuration soon ( 1/2 + TB of memory) . > > However the state of integration of our code with the QEMU -code base is not as > advanced and polished as the one you currently have and i would like to know if > you would be interested in trying to join our effort or collaborate in merging > our solution. Or maybe allowing us to piggy back on your effort. Yeah, we can unite our efforts for the upstream. Especially clean interface for both non-RDMA/RDMA (qemu internal/qemu-kernel) is important. At the moment I have no clue to the requirement of RDMA postcopy and your implementation. "transparently integrating with the MMU at the OS level" sounds interesting. thanks, > Would you bee free to meet at any time next week ? ( from Tuesday to Friday) > > Ps: we would be open sourcing our project by the end of the month of November > and the post copy is only a small part of the technology developed. > > . > > > Regards > Benoit > > > On 30 October 2012 08:32, Isaku Yamahata <yamahata@xxxxxxxxxxxxx> wrote: > > This is the v3 patch series of postcopy migration. > > The trees is available at > git://github.com/yamahata/qemu.git qemu-postcopy-oct-30-2012 > git://github.com/yamahata/linux-umem.git linux-umem-oct-29-2012 > > Major changes v2 -> v3: > - implemented pre+post optimization > - auto detection of postcopy by incoming side > - using threads on destination instead of fork > - using blocking io instead of select + non-blocking io loop > - less memory overhead > - various improvement and code simplification > - kernel module name change umem -> uvmem to avoid name conflict. > > Patches organization: > 1-2: trivial fixes > 3-5: prepartion for threading. cherry-picked from migration tree > 6-18: refactoring existing code and preparation > 19-25: implement postcopy live migration itself (essential part) > 26-35: optimization/heuristic for postcopy > > Usage > ===== > You need load uvmem character device on the host before starting migration. > Postcopy can be used for tcg and kvm accelarator. The implementation depend > on only linux uvmem character device. But the driver dependent code is > split > into a file. > I tested only host page size == guest page size case, but the > implementation > allows host page size != guest page size case. > > The following options are added with this patch series. > - incoming part > use -incoming as usual. Postcopy is automatically detected. > example: > qemu -incoming tcp:0:4444 -monitor stdio -machine accel=kvm > > - outging part > options for migrate command > migrate [-p [-n] [-m]] URI > [<precopy count> [<prefault forward> [<prefault backword>]]] > > Newly added options/arguments > -p: indicate postcopy migration > -n: disable background transferring pages: This is for benchmark/ > debugging > -m: move background transfer of postcopy mode > <precopy count>: The number of precopy RAM scan before postcopy. > default 0 (0 means no precopy) > <prefault forward>: The number of forward pages which is sent with > on-demand > <prefault backward>: The number of backward pages which is sent with > on-demand > > example: > migrate -p -n tcp:<dest ip address>:4444 > migrate -p -n -m tcp:<dest ip address>:4444 42 42 0 > > > TODO > ==== > - benchmark/evaluation > - improve/optimization > At the moment at least what I'm aware of is > - pre+post case > On desitnation side reading dirty bitmap would cause long latency. > create thread for that. > - consider on FUSE/CUSE possibility > > basic postcopy work flow > ======================== > qemu on the destination > | > V > open(/dev/uvmem) > | > V > UVMEM_INIT > | > V > Here we have two file descriptors to > umem device and shmem file > | > | umem threads > | on the destination > | > V create pipe to communicate > crete threads--------------------------------, > | | > V mmap(shmem file) > mmap(uvmem device) for guest RAM close(shmem file) > | | > | | > V | > wait for ready from daemon <----pipe-----send ready message > | | > | Here the daemon takes over > send ok------------pipe---------------> the owner of the socket > | to the source > V | > entering post copy stage | > start guest execution | > | | > V V > access guest RAM read() to get faulted > pages > | | > V V > page fault ------------------------------>page offset is returned > block | > V > pull page from the source > write the page contents > to the shmem. > | > V > unblock <-----------------------------write() to tell served > pages > the fault handler returns the page | > page fault is resolved | > | V > | touch guest RAM pages > | | > | V > | release the cached page > | madvise(MADV_REMOVE) > | > | > | pages can be sent > | backgroundly > | | > | V > | mark page is cached > | Thus future page fault is > | avoided. > | | > | V > | touch guest RAM pages > | | > | V > | release the cached page > | madvise(MADV_REMOVE) > | | > V V > > all the pages are pulled from the source > > | | > V V > migration completes exit() > > > Isaku Yamahata (32): > migration.c: remove redundant line in migrate_init() > arch_init: DPRINTF format error and typo > osdep: add qemu_read_full() to read interrupt-safely > savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip, > qemu_fflush > savevm/QEMUFile: consolidate QEMUFile functions a bit > savevm/QEMUFile: introduce qemu_fopen_fd > savevm/QEMUFile: add read/write QEMUFile on memory buffer > savevm, buffered_file: introduce method to drain buffer of buffered > file > arch_init: export RAM_SAVE_xxx flags for postcopy > arch_init/ram_save: introduce constant for ram save version = 4 > arch_init: refactor ram_save_block() and export ram_save_block() > arch_init/ram_save_setup: factor out bitmap alloc/free > arch_init/ram_load: refactor ram_load > arch_init: factor out logic to find ram block with id string > migration: export migrate_fd_completed() and migrate_fd_cleanup() > uvmem.h: import Linux uvmem.h and teach update-linux-headers.sh > osdep: add QEMU_MADV_REMOVE and tirivial fix > postcopy: introduce helper functions for postcopy > savevm: add new section that is used by postcopy > postcopy: implement incoming part of postcopy live migration > postcopy outgoing: add -p option to migrate command > postcopy: implement outgoing part of postcopy live migration > postcopy/outgoing: add -n options to disable background transfer > postcopy/outgoing: implement forward/backword prefault > arch_init: factor out setting last_block, last_offset > postcopy/outgoing: add movebg mode(-m) to migration command > arch_init: factor out ram_load > arch_init: export ram_save_iterate() > postcopy: pre+post optimization incoming side > arch_init: export migration_bitmap_sync and helper method to get > bitmap > postcopy/outgoing: introduce precopy_count parameter > postcopy: pre+post optimization outgoing side > > Paolo Bonzini (1): > split MRU ram list > > Umesh Deshpande (2): > add a version number to ram_list > protect the ramlist with a separate mutex > > Makefile.target | 2 + > arch_init.c | 391 +++++--- > arch_init.h | 24 + > buffered_file.c | 59 +- > buffered_file.h | 1 + > cpu-all.h | 16 +- > exec.c | 62 +- > hmp-commands.hx | 21 +- > hmp.c | 12 +- > linux-headers/linux/uvmem.h | 41 + > migration-exec.c | 8 +- > migration-fd.c | 23 +- > migration-postcopy.c | 2019 > +++++++++++++++++++++++++++++++++++++++ > migration-tcp.c | 16 +- > migration-unix.c | 36 +- > migration.c | 65 +- > migration.h | 42 + > osdep.c | 24 + > osdep.h | 13 +- > qapi-schema.json | 6 +- > qemu-common.h | 2 + > qemu-file.h | 12 +- > qmp-commands.hx | 4 +- > savevm.c | 223 ++++- > scripts/update-linux-headers.sh | 2 +- > sysemu.h | 2 +- > umem.c | 291 ++++++ > umem.h | 88 ++ > vl.c | 5 +- > 29 files changed, 3265 insertions(+), 245 deletions(-) > create mode 100644 linux-headers/linux/uvmem.h > create mode 100644 migration-postcopy.c > create mode 100644 umem.c > create mode 100644 umem.h > > -- > 1.7.10.4 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > " The production of too many useful things results in too many useless people" -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html