Re: [PATCHv1 dont apply] RFC: kvm eoi PV using shared memory

Gleb Natapov <gleb@xxxxxxxxxx> · Mon, 16 Apr 2012 20:51:16 +0300



On Mon, Apr 16, 2012 at 07:33:28PM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 16, 2012 at 06:10:11PM +0300, Gleb Natapov wrote:
> > On Mon, Apr 16, 2012 at 04:13:29PM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Apr 16, 2012 at 03:30:47PM +0300, Gleb Natapov wrote:
> > > > On Mon, Apr 16, 2012 at 03:18:25PM +0300, Michael S. Tsirkin wrote:
> > > > > On Mon, Apr 16, 2012 at 02:24:46PM +0300, Gleb Natapov wrote:
> > > > > > On Mon, Apr 16, 2012 at 02:09:20PM +0300, Michael S. Tsirkin wrote:
> > > > > > > Thanks very much for the review. I'll address the comments.
> > > > > > > Some questions on your comments below.
> > > > > > > 
> > > > > > > On Mon, Apr 16, 2012 at 01:08:07PM +0300, Gleb Natapov wrote:
> > > > > > > > > @@ -37,6 +38,8 @@
> > > > > > > > >  #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
> > > > > > > > >  #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
> > > > > > > > >  #define MSR_KVM_STEAL_TIME  0x4b564d03
> > > > > > > > > +#define MSR_KVM_EOI_EN      0x4b564d04
> > > > > > > > > +#define MSR_KVM_EOI_DISABLED 0x0L
> > > > > > > > This is valid gpa. Follow others MSR example i.e align the address to,
> > > > > > > > lets say dword, and use lsb as enable bit.
> > > > > > > 
> > > > > > > We only need a single byte, since this is per-CPU -
> > > > > > > it's better to save the memory, so no alignment is required.
> > > > > > > An explicit disable msr would also address this, right?
> > > > > > > 
> > > > > > We do not have shortage of memory.
> > > > > > Better make all MSRs works the same
> > > > > > way.
> > > > > 
> > > > > I agree it's nice to have EOI and ASYNC_PF look similar
> > > > > but wasting memory is also bad.  I'll ponder this some more.
> > > > > 
> > > > Steal time and kvm clock too and may be others (if anything left at
> > > > all). I hope you are kidding about wasting of 4 bytes per vcpu.
> > > 
> > > Not vcpu - cpu. It's wasted whenever kernel/kvm.c is built so it has
> > > cost on physical machines as well.
> > > 
> > There are less real cpus than vcpus usually :)
> 
> I'm adding this percpu always. This makes it cheap to
> access but it means it is allocated on physical cpus -
> just unused there.
> 
I got it. So suppose you have 1024 cpus, so if you'll use dword instead
of byte you will spend additional 3072 bytes (which you likely spend
anyway due to alignment that will be done to your u8). How much memory
do you expect to have with your 1024 cpus to care about 3072 bytes?
Are we seriously discussing it?

> > > > > > BTW have you added new MSR to msrs_to_save array? I forgot to
> > > > > > checked.
> > > > > 
> > > > > I didn't yet. Trying to understand how will that affect
> > > > > cross-version migration - any input?
> > > > > 
> > > > Not sure. You need to check what userspace does with them.
> > > > 
> > > > > > > > > +static void apic_update_isr(struct kvm_lapic *apic)
> > > > > > > > > +{
> > > > > > > > > +	int vector;
> > > > > > > > > +	if (!eoi_enabled(apic->vcpu) ||
> > > > > > > > > +	    !apic->vcpu->arch.eoi.pending ||
> > > > > > > > > +	    eoi_get_pending(apic->vcpu))
> > > > > > > > > +		return;
> > > > > > > > > +	apic->vcpu->arch.eoi.pending = false;
> > > > > > > > > +	vector = apic_find_highest_isr(apic);
> > > > > > > > > +	if (vector == -1)
> > > > > > > > > +		return;
> > > > > > > > > +	apic_clear_vector(vector, apic->regs + APIC_ISR);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > We should just call apic_set_eoi() on exit if eoi.pending && !eoi_get_pending().
> > > > > > > > This removes the need for the function and its calls.
> > > > > > > 
> > > > > > > It's a bit of a waste: that one does all kind extra things
> > > > > > > which we know we don't need, some of the atomics. And it's datapath
> > > > > > > so extra stuff is not free.
> > > > > > > 
> > > > > > How much time those extra things are taking compared to vmexit you
> > > > > > already serving? And there is a good chance you will do them during
> > > > > > vmentry anyway while trying to inject (or just check for) new interrupt.
> > > > > 
> > > > > No need to do them twice :)
> > > > > 
> > > > > > > Probably a good idea to replace the call on MSR disable - I think
> > > > > > > apic_update_ppr is a better thing to call there.
> > > > > > > 
> > > > > > > Is there anything else that I missed?
> > > > > > I think that simple things are better then complex things if the end result is
> > > > > > the same :) Try it and see how much simpler it is.
> > > > > 
> > > > > It doesn't seem to be simpler at all. The common functionality is
> > > > > about 4 lines.
> > > > Send patch for us to see.
> > > 
> > > That's what you are replying to, no?
> > > You can see that it is 4 lines of code.
> > No. I mean something like patch below. Applies on top of yours. Did not
> > check that it works or even compiles.
> > 
> > > 
> > > > lapic changes should be minimal.
> > > 
> > > Exactly my motivation.
> > > 
> > My patch removes 13 lines more :)
> 
> I'll take a look, thanks.
> 
> > > > > 
> > > > > > Have you measured
> > > > > > that what you are trying to optimize actually worth optimizing? That you
> > > > > > can measure the optimization at all?
> > > > > 
> > > > > The claim is not that it's measureable. The claim is that
> > > > > it does not scale to keep adding things to do on each entry.
> > > > > 
> > > > Only if there is something to do. "Premature optimization is the root of
> > > > all evil". The PV eoi is about not exiting on eoi unnecessary. You are
> > > > mixing this with trying to avoid calling eoi code for given interrupt at
> > > > all.
> > > 
> > > I don't think this is what my patch does. EOI still clears ISR
> > > for each interrupt.
> > > 
> > > > Two different optimization, do not try lump them together.
> > > > > > > 
> > > > > > > > We already have
> > > > > > > > call to kvm_lapic_sync_from_vapic() on exit path which should be
> > > > > > > > extended to do the above.
> > > > > > > 
> > > > > > > It already does this. It calls apic_set_tpr
> > > > > > > which calls apic_update_ppr which calls
> > > > > > > apic_update_isr.
> > > > > > > 
> > > > > > It does it only if vapic is in use (and it is usually not).
> > > > > 
> > > > > When it's not we don't need to update ppr and so
> > > > > no need to update isr on this exit.
> > > > If there was eoi we need to update both.
> > > 
> > > By same logic we should call update_ppr on each entry.
> > > The overhead is unlikely to be measureable either :).
> > > 
> > It is small enough for us to not care about it on RHEL6 where it is
> > called on each entry.
> 
> Exactly. So why do not we do it?
> 
Calling it on each entry is a little bit excessive, calling it on each
interrupt it OK. But even when it is called on each entry it does not
create performance problems.

> > > > > 
> > > > > > But the if()
> > > > > > is already there so we do not need to worry that one additional if() on
> > > > > > the exit path will slow KVM to the crawl.
> > > > > 
> > > > > The number of things we need to do on each entry keeps going up, if we
> > > > > just keep adding stuff it won't end well.
> > > > > 
> > > > You do not add stuff. The if() is already there.
> > > 
> > > 
> > > Your proposal was to check userspace eoi record
> > > each time when eoi is pending, no?
> > Yes.
> > 
> > > This would certainly add some overhead.
> > > 
> > Only when eoi is pending. This is rare.
> 
> This is exactly while guest handles an interrupt.
> It's not all that rare at all: e.g. device
> drivers cause an exit from interrupt
> handler by doing io.
So eoi will be coalesced with io that device driver does. Exactly what
we want. But in great scheme of things interrupts are rare :) The
optimizations is disabled when interrupts are coming faster that they
are served anyway.

> 
> > > I also find the logic easier to follow as is -
> > > it is contained in lapic.c without relying
> > > on being called from x86.c as just the right moment.
> > > 
> > See the patch. It change nothing outside of lapic.c.
> 
> I'll take a look, thanks.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html