RE: [RFC Design Doc]Speed up live migration by skipping free pages

"Li, Liang Z" <liang.z.li@xxxxxxxxx> · Thu, 24 Mar 2016 15:39:58 +0000

> > > > I mean why do you think that's can't guaranteed to work.
> > > > Yes, ram_addr_t is not guaranteed to equal GPA of the block. But I
> > > > didn't use them as GPA. The code in the
> > > > filter_out_guest_free_pages() in my patch just follow the style of
> > > > the latest change of
> > > ram_list.dirty_memory[].
> > > >
> > > > The free page bitmap got from the guest in my RFC patch has been
> > > > filtered out the 'hole', so the bit N of the free page bitmap and
> > > > the bit N in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]-
> >blocks
> > > > are corresponding to the same guest page.  Right?
> > > > If it's true, I think I am doing the right thing?
> > > >
> > > >
> > > > Liang
> > >
> > > There's no guarantee that there's a single 'hole'
> > > even on the PC, and we want balloon to be portable.
> > >
> >
> > As long as we know how many 'hole' and where the holes are.
> > we can filter out them. QEMU should have this kind of information.
> > I know my RFC patch passed an arch specific free page bitmap is not a
> > good idea. So in my design, I changed this by passing a loose free
> > page bitmap which contains the holes, and let QEMU to filter out the
> > holes according to some arch specific information. This can make balloon be
> portable.
> 
> Only if you write the arch specific thing for all arches.

I plan to keep a function stub for each arch to implement. And I have done that for X86.

> This concept of holes simply does not match how we manage memory in
> qemu.

I don't know if it works for other arches, but it works for X86.

> > > So I'm not sure I understand what your patch is doing, do you mean
> > > you pass the GPA to ram addr mapping from host to guest?
> > >
> >
> > No, my patch passed the 'lowmem', which helps to filter out the hole from
> host to guest.
> > The design has changed this.
> >
> > > That can be made to work but it's not a good idea, and I don't see
> > > why would it be faster than doing the same translation host side.
> > >
> >
> > It's faster because there is no address translation, most of them are bitmap
> operation.
> >
> > Liang
> 
> It's just wrong to say that there is no translation. Of course there has to be
> one.
> 
> Fundamentally guest uses GPA as an offset in the bitmap. QEMU uses
> ram_addr_t for migration so you either translate GPA to ram_addr_t or
> ram_addr_t to GPA.
> 
> I think the reason for the speedup that you observe is that you only need to
> translate ram_addr_t to GPA once per ramblock, which is much faster than
> translating GPA to ram_addr_t for each page.
> 

Yes, exactly! 

Liang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html