Re: Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

SeongJae Park <sj38.park@xxxxxxxxx> · Thu, 12 Dec 2019 17:15:37 +0100

On Thu, 12 Dec 2019 16:23:17 +0100 "Roger Pau Monné" <roger.pau@xxxxxxxxxx> wrote:

> > On Thu, 12 Dec 2019 12:42:47 +0100 "Roger Pau Monné" <roger.pau@xxxxxxxxxx> wrote:
> > > > On the slow block device
> > > > ------------------------
> > > > 
> > > >     max_pgs   Min       Max       Median     Avg    Stddev
> > > >     0         38.7      45.8      38.7       40.12  3.1752165
> > > >     1024      38.7      45.8      38.7       40.12  3.1752165
> > > >     No difference proven at 95.0% confidence
> > > > 
> > > > On the fast block device
> > > > ------------------------
> > > > 
> > > >     max_pgs   Min       Max       Median     Avg    Stddev
> > > >     0         417       423       420        419.4  2.5099801
> > > >     1024      414       425       416        417.8  4.4384682
> > > >     No difference proven at 95.0% confidence
> > > 
> > > This is intriguing, as it seems to prove that the usage of a cache of
> > > free pages is irrelevant performance wise.
> > > 
> > > The pool of free pages was introduced long ago, and it's possible that
> > > recent improvements to the balloon driver had made such pool useless,
> > > at which point it could be removed instead of worked around.
> > 
> > I guess the grant page allocation overhead in this test scenario is really
> > small.  In an absence of memory pressure, fragmentation, and NUMA imbalance,
> > the latency of the page allocation ('get_page()') is very short, as it will
> > success in the fast path.
> 
> The allocation of the pool of free pages involves more than get_page,
> it uses gnttab_alloc_pages which in the worse case will allocate a
> page and balloon it out issuing one hypercall.
> 
> > Few years ago, I once measured the page allocation latency on my machine.
> > Roughly speaking, it was about 1us in best case, 100us in worst case, and 5us
> > in average.  Please keep in mind that the measurement was not designed and
> > performed in serious way.  Thus the results could have profile overhead in it,
> > though.  While keeping that in mind, let's simply believe the number and ignore
> > the latency of the block layer, blkback itself (including the grant
> > mapping), and anything else including context switch, cache miss, but the
> > allocation.  In other words, suppose that the grant page allocation is only one
> > source of the overhead.  It will be able to achieve 1 million IOPS (4KB *
> > 1MIOPS = 4 GB/s) in the best case, 200 thousand IOPS (800 MB/s) in average, and
> > 10 thousand IOPS (40 MB/s) in worst case.  Based on this coarse calculation, I
> > think the test results is reasonable.
> > 
> > This also means that the effect of the blkback's free pages pool might be
> > visible under page allocation fast path failure situation.  Nevertheless, it
> > would be also hard to measure that in micro level unless the measurement is
> > well designed and controlled.
> > 
> > > 
> > > Do you think you could perform some more tests (as pointed out above
> > > against the block device to skip the fs overhead) and report back the
> > > results?
> > 
> > To be honest, I'm not sure whether additional tests are really necessary,
> > because I think the `dd` test and the results explanation already makes some
> > sense and provide the minimal proof of the concept.  Also, this change is a
> > fallback for the memory pressure situation, which is an error path in some
> > point of view.  Such errorneous situation might not happen frequently and if
> > the situation is not solved in short time, something much worse (e.g., OOM kill
> > of the user space xen control processes) than temporal I/O performance
> > degradation could happen.  Thus, I'm not sure whether such detailed performance
> > measurement is necessary for this rare error handling change.
> 
> Right, my main concern is that we seem to be adding duck tape so
> things don't fall apart, but if such cache is really not beneficial
> from a performance PoV I would rather see it go away than adding more
> stuff to it in order to workaround corner cases like memory
> starvation.

Right, if the cache is really giving no benefit, it would be much better to
simply remove it.  However, as mentioned before, I'm not sure whether it is
useless at all.  Maybe we could do some more detailed test to know that, but it
would be an out of scope of this patch.

> 
> Anyway, I guess we can take such change, but long term we need to look
> into fixing grants to not use ballooned pages, and figure out if the
> blkback free page cache is really useful or not.

Totally agreed.

Thanks,
SeongJae Park

> 
> Thanks, Roger.
>