RE: [GIT PULL] mm: frontswap (for 3.2 window)

Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> · Wed, 2 Nov 2011 14:14:16 -0700 (PDT)

> From: Rik van Riel [mailto:riel@xxxxxxxxxx]
> Subject: Re: [GIT PULL] mm: frontswap (for 3.2 window)
> 
> On 10/31/2011 07:36 PM, Dan Magenheimer wrote:
> >> From: Andrea Arcangeli [mailto:aarcange@xxxxxxxxxx]
> 
> >>> real work to do instead and (2) that vmexit/vmenter is horribly
> >>
> >> Sure the CPU has another 1000 VM to schedule. This is like saying
> >> virtio-blk isn't needed on desktop virt becauase the desktop isn't
> >> doing much I/O. Absurd argument, there are another 1000 desktops doing
> >> I/O at the same time of course.
> >
> > But this is truly different, I think at least for the most common
> > cases, because the guest is essentially out of physical memory if it
> > is swapping.  And the vmexit/vmenter (I assume, I don't really
> > know KVM) gives the KVM scheduler the opportunity to schedule
> > another of those 1000 VMs if it wishes.
> 
> I believe the problem Andrea is trying to point out here is
> that the proposed API cannot handle a batch of pages to be
> pushed into frontswap/cleancache at one time.

That wasn't the part of Andrea's discussion I meant, but I
am getting foggy now, so let's address your point rather
than mine.

> Even if the current back-end implementations are synchronous
> and can only do one page at a time, I believe it would still
> be a good idea to have the API able to handle a vector with
> a bunch of pages all at once.
> 
> That way we can optimize the back-ends as required, at some
> later point in time.
> 
> If enough people start using tmem, such bottlenecks will show
> up at some point :)

It occurs to me that batching could be done locally without
changing the in-kernel "API" (i.e. frontswap_ops)... the
guest-side KVM tmem-backend-driver could do the compression
into guest-side memory and make a single
hypercall=vmexit/vmenter whenever it has collected enough for
a batch. The "get" and "flush" would have to search this guest-side
local cache and, if not local, make a hypercall.

This is more or less what RAMster does, except it (currently)
still transmits the "batch" one (pre-compressed) page at a time.

And, when I think about it deeper (with my currently admittedly
fried brain), this may even be the best way to do batching
anyway.  I can't think offhand where else you would put
a "put batch" hook in the swap subsystem because I think
the current swap subsystem batching code only works with
adjacent "entry" numbers.

And, one more thing occurs to me then... this shows the KVM
"ABI" (hypercall) is not constrained by the existing Xen
ABI.  It can be arbitrarily more functional.

/me gets hand slapped remotely from Oracle HQ ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href