RE: [RFC 00/10] KVM: Add TMEM host/guest support

Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> · Mon, 11 Jun 2012 08:44:21 -0700 (PDT)

> From: Avi Kivity [mailto:avi@xxxxxxxxxx]
> >
> > The guest doesn't do eviction at all, in fact - it doesn't know how big
> > the cache is so even if it wanted to, it couldn't evict pages (the only
> > thing it does is invalidate pages which have changed in the guest).
> 
> IIUC, when the guest reads a page, it first has to make room in its own
> pagecache; before dropping a clean page it calls cleancache to dispose
> of it, which calls a hypercall which compresses and stores it on the
> host.  Next a page is allocated and a cleancache hypercall is made to
> see if it is in host tmem.  So two hypercalls per page, once guest
> pagecache is full.

Yes, Avi is correct here.

> >> This is pretty steep.  We have flash storage doing a million iops/sec,
> >> and here you add 19 microseconds to that.
> >
> > Might be interesting to test it with flash storage as well...

Well, to be fair, you are comparing a device that costs many
thousands of $US to a software solution that uses idle CPU
cycles and no additional RAM.

> Batching will drastically reduce the number of hypercalls.

For the record, batching CAN be implemented... ramster is essentially
an implementation of batching where the local system is the "guest"
and the remote system is the "host".  But with ramster the
overhead to move the data (whether batched or not) is much MUCH
worse than a hypercall and ramster still shows performance advantage.

So, IMHO, one step at a time.  Get the foundation code in
place and tune it later if a batching implementation can
be demonstrated to improve performance sufficiently.

> A different
> alternative is to use ballooning to feed the guest free memory so it
> doesn't need to hypercall at all.  Deciding how to divide free memory
> among the guests is hard (but then so is deciding how to divide tmem
> memory among guests), and adding dedup on top of that is also hard (ksm?
> zksm?).  IMO letting the guest have the memory and manage it on its own
> will be much simpler and faster compared to the constant chatting that
> has to go on if the host manages this memory.

Here we disagree, maybe violently.  All existing solutions that
try to do manage memory across multiple tenants from an "external
memory manager policy" fail miserably.  Tmem is at least trying
something new by actively involving both the host and the guest
in the policy (guest decides which pages, host decided how many)
and without the massive changes required for something like
IBM's solution (forgot what it was called).  Yes, tmem has
overhead but since the overhead only occurs where pages
would otherwise have to be read/written from disk, the
overhead is well "hidden".

BTW, dedup in zcache is fairly easy to implement because the
pages can only be read/written as an entire page and only
through a well-defined API.  Xen does it (with optional
compression), zcache could also, but it never made much sense
for zcache when there was only one tenant.  KVM of course
benefits from KSM, but IIUC KSM only works on anonymous pages.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html