On Thu, Jan 30, 2020 at 7:31 AM Wang, Wei W <wei.w.wang@xxxxxxxxx> wrote:
On Thursday, January 30, 2020 11:03 PM, David Hildenbrand wrote:
> On 29.01.20 20:11, Tyler Sanderson wrote:
> >
> >
> > On Wed, Jan 29, 2020 at 2:31 AM David Hildenbrand <david@xxxxxxxxxx
> > <mailto:david@xxxxxxxxxx>> wrote:
> >
> > On 29.01.20 01:22, Tyler Sanderson via Virtualization wrote:
> > > A primary advantage of virtio balloon over other memory reclaim
> > > mechanisms is that it can pressure the guest's page cache into
> > shrinking.
> > >
> > > However, since the balloon driver changed to using the shrinker API
> > >
> >
> <https://github.com/torvalds/linux/commit/71994620bb25a8b109388fefa9
> e99a28e355255a#diff-fd202acf694d9eba19c8c64da3e480c9> this
> > > use case has become a bit more tricky. I'm wondering what the
> intended
> > > device implementation is.
> > >
> > > When inflating the balloon against page cache (i.e. no free memory
> > > remains) vmscan.c will both shrink page cache, but also invoke the
> > > shrinkers -- including the balloon's shrinker. So the balloon driver
> > > allocates memory which requires reclaim, vmscan gets this memory
> by
> > > shrinking the balloon, and then the driver adds the memory back to
> the
> > > balloon. Basically a busy no-op.
Per my understanding, the balloon allocation won’t invoke shrinker as __GFP_DIRECT_RECLAIM isn't set, no?
I could be wrong about the mechanism, but the device sees lots of activity on the deflate queue. The balloon is being shrunk. And this only starts once all free memory is depleted and we're inflating into page cache.
> > >
> > > If file IO is ongoing during this balloon inflation then the page
> > cache
> > > could be growing which further puts "back pressure" on the balloon
> > > trying to inflate. In testing I've seen periods of > 45 seconds where
> > > balloon inflation makes no net forward progress.
I think this is intentional (but could be improved). As inflation does not stop when the allocation fails (it simply sleeps for a while and resumes.. repeat till there are memory to inflate)
That's why you see no inflation progress for long time under memory pressure.
As noted above the deflate queue is active, so it's not just memory allocation failures.
Best,
Wei