Re: [PATCH v2 3/8] migration: show the statistics of compression

Peter Xu <peterx@xxxxxxxxxx> · Thu, 26 Jul 2018 13:29:15 +0800

On Wed, Jul 25, 2018 at 05:44:02PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@xxxxxxxxxx) wrote:
> > On Mon, Jul 23, 2018 at 03:39:18PM +0800, Xiao Guangrong wrote:
> > > 
> > > 
> > > On 07/23/2018 12:36 PM, Peter Xu wrote:
> > > > On Thu, Jul 19, 2018 at 08:15:15PM +0800, guangrong.xiao@xxxxxxxxx wrote:
> > > > > @@ -1597,6 +1608,24 @@ static void migration_update_rates(RAMState *rs, int64_t end_time)
> > > > >               rs->xbzrle_cache_miss_prev) / iter_count;
> > > > >           rs->xbzrle_cache_miss_prev = xbzrle_counters.cache_miss;
> > > > >       }
> > > > > +
> > > > > +    if (migrate_use_compression()) {
> > > > > +        uint64_t comp_pages;
> > > > > +
> > > > > +        compression_counters.busy_rate = (double)(compression_counters.busy -
> > > > > +            rs->compress_thread_busy_prev) / iter_count;
> > > > 
> > > > Here I'm not sure it's correct...
> > > > 
> > > > "iter_count" stands for ramstate.iterations.  It's increased per
> > > > ram_find_and_save_block(), so IMHO it might contain multiple guest
> > > 
> > > ram_find_and_save_block() returns if a page is successfully posted and
> > > it only posts 1 page out at one time.
> > 
> > ram_find_and_save_block() calls ram_save_host_page(), and we should be
> > sending multiple guest pages in ram_save_host_page() if the host page
> > is a huge page?
> > 
> > > 
> > > > pages.  However compression_counters.busy should be per guest page.
> > > > 
> > > 
> > > Actually, it's derived from xbzrle_counters.cache_miss_rate:
> > >         xbzrle_counters.cache_miss_rate = (double)(xbzrle_counters.cache_miss -
> > >             rs->xbzrle_cache_miss_prev) / iter_count;
> > 
> > Then this is suspecious to me too...
> 
> Actually; I think this isn't totally wrong;  iter_count is the *difference* in
> iterations since the last time it was updated:
> 
>    uint64_t iter_count = rs->iterations - rs->iterations_prev;
> 
>         xbzrle_counters.cache_miss_rate = (double)(xbzrle_counters.cache_miss -
>             rs->xbzrle_cache_miss_prev) / iter_count;
> 
> so this is:
>       cache-misses-since-last-update
>       ------------------------------
>         iterations since last-update
> 
> so the 'miss_rate' is ~misses / iteration.
> Although that doesn't really correspond to time.

I'm not sure I got the idea here, the thing is that I think the
counters are for different granularities which might be problematic:

- xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's
  per-guest-page granularity

- RAMState.iterations is done for each ram_find_and_save_block(), so
  it's per-host-page granularity

An example is that when we migrate a 2M huge page in the guest, we
will only increase the RAMState.iterations by 1 (since
ram_find_and_save_block() will be called once), but we might increase
xbzrle_counters.cache_miss for 2M/4K=512 times (we'll call
save_xbzrle_page() that many times) if all the pages got cache miss.
Then IMHO the cache miss rate will be 512/1=51200% (while it should
actually be just 100% cache miss).

Regards,

-- 
Peter Xu