On Mon, Jun 04, 2012 at 07:27:25AM -0700, Chegu Vinod wrote: > On 6/4/2012 6:13 AM, Isaku Yamahata wrote: >> On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote: >>> Hello Isaku Yamahata, >> Hi. >> >>> I just saw your patches..Would it be possible to email me a tar bundle of these >>> patches (makes it easier to apply the patches to a copy of the upstream qemu.git) >> I uploaded them to github for those who are interested in it. >> >> git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012 >> git://github.com/yamahata/linux-umem.git linux-umem-june-04-2012 >> > > Thanks for the pointer... >>> BTW, I am also curious if you have considered using any kind of RDMA features for >>> optimizing the page-faults during postcopy ? >> Yes, RDMA is interesting topic. Can we share your use case/concern/issues? > > > Looking at large sized guests (256GB and higher) running cpu/memory > intensive enterprise workloads. > The concerns are the same...i.e. having a predictable total migration > time, minimal downtime/freeze-time and of course minimal service > degradation to the workload(s) in the VM or the co-located VM's... > > How large of a guest have you tested your changes with and what kind of > workloads have you used so far ? Only up to several GB VM. Off course We'd like to benchmark with real huge VM (several hundred GB), but it's somewhat difficult. >> Thus we can collaborate. >> You may want to see Benoit's results. > > Yes. 'have already seen some of Benoit's results. Great. > Hence the question about use of RDMA techniques for post copy. So far my implementation doesn't used RDMA. >> As long as I know, he has not published >> his code yet. > > Thanks > Vinod > >> >> thanks, >> >>> Thanks >>> Vinod >>> >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Mon, 4 Jun 2012 18:57:02 +0900 >>> From: Isaku Yamahata<yamahata@xxxxxxxxxxxxx> >>> To: qemu-devel@xxxxxxxxxx, kvm@xxxxxxxxxxxxxxx >>> Cc: benoit.hudzia@xxxxxxxxx, aarcange@xxxxxxxxxx, aliguori@xxxxxxxxxx, >>> quintela@xxxxxxxxxx, stefanha@xxxxxxxxx, t.hirofuchi@xxxxxxxxxx, >>> dlaor@xxxxxxxxxx, satoshi.itoh@xxxxxxxxxx, mdroth@xxxxxxxxxxxxxxxxxx, >>> yoshikawa.takuya@xxxxxxxxxxxxx, owasserm@xxxxxxxxxx, avi@xxxxxxxxxx, >>> pbonzini@xxxxxxxxxx >>> Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration >>> Message-ID:<cover.1338802190.git.yamahata@xxxxxxxxxxxxx> >>> >>> After the long time, we have v2. This is qemu part. >>> The linux kernel part is sent separatedly. >>> >>> Changes v1 -> v2: >>> - split up patches for review >>> - buffered file refactored >>> - many bug fixes >>> Espcially PV drivers can work with postcopy >>> - optimization/heuristic >>> >>> Patches >>> 1 - 30: refactoring exsiting code and preparation >>> 31 - 37: implement postcopy itself (essential part) >>> 38 - 41: some optimization/heuristic for postcopy >>> >>> Intro >>> ===== >>> This patch series implements postcopy live migration.[1] >>> As discussed at KVM forum 2011, dedicated character device is used for >>> distributed shared memory between migration source and destination. >>> Now we can discuss/benchmark/compare with precopy. I believe there are >>> much rooms for improvement. >>> >>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration >>> >>> >>> Usage >>> ===== >>> You need load umem character device on the host before starting migration. >>> Postcopy can be used for tcg and kvm accelarator. The implementation depend >>> on only linux umem character device. But the driver dependent code is split >>> into a file. >>> I tested only host page size == guest page size case, but the implementation >>> allows host page size != guest page size case. >>> >>> The following options are added with this patch series. >>> - incoming part >>> command line options >>> -postcopy [-postcopy-flags<flags>] >>> where flags is for changing behavior for benchmark/debugging >>> Currently the following flags are available >>> 0: default >>> 1: enable touching page request >>> >>> example: >>> qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm >>> >>> - outging part >>> options for migrate command >>> migrate [-p [-n] [-m]] URI [<prefault forward> [<prefault backword>]] >>> -p: indicate postcopy migration >>> -n: disable background transferring pages: This is for benchmark/debugging >>> -m: move background transfer of postcopy mode >>> <prefault forward>: The number of forward pages which is sent with on-demand >>> <prefault backward>: The number of backward pages which is sent with >>> on-demand >>> >>> example: >>> migrate -p -n tcp:<dest ip address>:4444 >>> migrate -p -n -m tcp:<dest ip address>:4444 32 0 >>> >>> >>> TODO >>> ==== >>> - benchmark/evaluation. Especially how async page fault affects the result. >>> - improve/optimization >>> At the moment at least what I'm aware of is >>> - making incoming socket non-blocking with thread >>> As page compression is comming, it is impractical to non-blocking read >>> and check if the necessary data is read. >>> - touching pages in incoming qemu process by fd handler seems suboptimal. >>> creating dedicated thread? >>> - outgoing handler seems suboptimal causing latency. >>> - consider on FUSE/CUSE possibility >>> - don't fork umemd, but create thread? >>> >>> basic postcopy work flow >>> ======================== >>> qemu on the destination >>> | >>> V >>> open(/dev/umem) >>> | >>> V >>> UMEM_INIT >>> | >>> V >>> Here we have two file descriptors to >>> umem device and shmem file >>> | >>> | umemd >>> | daemon on the destination >>> | >>> V create pipe to communicate >>> fork()---------------------------------------, >>> | | >>> V | >>> close(socket) V >>> close(shmem) mmap(shmem file) >>> | | >>> V V >>> mmap(umem device) for guest RAM close(shmem file) >>> | | >>> close(umem device) | >>> | | >>> V | >>> wait for ready from daemon<----pipe-----send ready message >>> | | >>> | Here the daemon takes over >>> send ok------------pipe---------------> the owner of the socket >>> | to the source >>> V | >>> entering post copy stage | >>> start guest execution | >>> | | >>> V V >>> access guest RAM read() to get faulted pages >>> | | >>> V V >>> page fault ------------------------------>page offset is returned >>> block | >>> V >>> pull page from the source >>> write the page contents >>> to the shmem. >>> | >>> V >>> unblock<-----------------------------write() to tell served pages >>> the fault handler returns the page >>> page fault is resolved >>> | >>> | pages can be sent >>> | backgroundly >>> | | >>> | V >>> | write() >>> | | >>> V V >>> The specified pages<-----pipe------------request to touch pages >>> are made present by | >>> touching guest RAM. | >>> | | >>> V V >>> reply-------------pipe-------------> release the cached page >>> | madvise(MADV_REMOVE) >>> | | >>> V V >>> >>> all the pages are pulled from the source >>> >>> | | >>> V V >>> the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS >>> (note: I'm not sure if this can be implemented or not) >>> | | >>> V V >>> migration completes exit() >>> >>> >>> >>> >>> Isaku Yamahata (41): >>> arch_init: export sort_ram_list() and ram_save_block() >>> arch_init: export RAM_SAVE_xxx flags for postcopy >>> arch_init/ram_save: introduce constant for ram save version = 4 >>> arch_init: refactor host_from_stream_offset() >>> arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case >>> arch_init: refactor ram_save_block() >>> arch_init/ram_save_live: factor out ram_save_limit >>> arch_init/ram_load: refactor ram_load >>> arch_init: introduce helper function to find ram block with id string >>> arch_init: simplify a bit by ram_find_block() >>> arch_init: factor out counting transferred bytes >>> arch_init: factor out setting last_block, last_offset >>> exec.c: factor out qemu_get_ram_ptr() >>> exec.c: export last_ram_offset() >>> savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip >>> savevm: qemu_pending_size() to return pending buffered size >>> savevm, buffered_file: introduce method to drain buffer of buffered >>> file >>> QEMUFile: add qemu_file_fd() for later use >>> savevm/QEMUFile: drop qemu_stdio_fd >>> savevm/QEMUFileSocket: drop duplicated member fd >>> savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close >>> savevm/QEMUFile: introduce qemu_fopen_fd >>> migration.c: remove redundant line in migrate_init() >>> migration: export migrate_fd_completed() and migrate_fd_cleanup() >>> migration: factor out parameters into MigrationParams >>> buffered_file: factor out buffer management logic >>> buffered_file: Introduce QEMUFileNonblock for nonblock write >>> buffered_file: add qemu_file to read/write to buffer in memory >>> umem.h: import Linux umem.h >>> update-linux-headers.sh: teach umem.h to update-linux-headers.sh >>> configure: add CONFIG_POSTCOPY option >>> savevm: add new section that is used by postcopy >>> postcopy: introduce -postcopy and -postcopy-flags option >>> postcopy outgoing: add -p and -n option to migrate command >>> postcopy: introduce helper functions for postcopy >>> postcopy: implement incoming part of postcopy live migration >>> postcopy: implement outgoing part of postcopy live migration >>> postcopy/outgoing: add forward, backward option to specify the size >>> of prefault >>> postcopy/outgoing: implement prefault >>> migrate: add -m (movebg) option to migrate command >>> migration/postcopy: add movebg mode >>> >>> Makefile.target | 5 + >>> arch_init.c | 298 ++++--- >>> arch_init.h | 20 + >>> block-migration.c | 8 +- >>> buffered_file.c | 322 ++++++-- >>> buffered_file.h | 32 + >>> configure | 12 + >>> cpu-all.h | 9 + >>> exec-obsolete.h | 1 + >>> exec.c | 87 ++- >>> hmp-commands.hx | 18 +- >>> hmp.c | 10 +- >>> linux-headers/linux/umem.h | 42 + >>> migration-exec.c | 12 +- >>> migration-fd.c | 25 +- >>> migration-postcopy-stub.c | 77 ++ >>> migration-postcopy.c | 1771 +++++++++++++++++++++++++++++++++++++++ >>> migration-tcp.c | 25 +- >>> migration-unix.c | 26 +- >>> migration.c | 97 ++- >>> migration.h | 47 +- >>> qapi-schema.json | 4 +- >>> qemu-common.h | 2 + >>> qemu-file.h | 8 +- >>> qemu-options.hx | 25 + >>> qmp-commands.hx | 4 +- >>> savevm.c | 177 ++++- >>> scripts/update-linux-headers.sh | 2 +- >>> sysemu.h | 4 +- >>> umem.c | 364 ++++++++ >>> umem.h | 101 +++ >>> vl.c | 16 +- >>> vmstate.h | 2 +- >>> 33 files changed, 3373 insertions(+), 280 deletions(-) >>> create mode 100644 linux-headers/linux/umem.h >>> create mode 100644 migration-postcopy-stub.c >>> create mode 100644 migration-postcopy.c >>> create mode 100644 umem.c >>> create mode 100644 umem.h >>> >>> >>> >>> >>> ------------------------------ >>> >>> > -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html