Re: [PATCH 08/31] nVMX: Fix local_vcpus_link handling

"Nadav Har'El" <nyh@xxxxxxxxxxxxxxxxxxx> · Tue, 24 May 2011 10:56:56 +0300

On Tue, May 24, 2011, Tian, Kevin wrote about "RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling":
> > +static inline void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
> > +{
> > +	vmcs_clear(loaded_vmcs->vmcs);
> > +	loaded_vmcs->cpu = -1;
> > +	loaded_vmcs->launched = 0;
> > +}
> > +
> 
> call it vmcs_init instead since you now remove original vmcs_init invocation,
> which more reflect the necessity of adding VMCLEAR for a new vmcs?

The best name for this function, I think, would have been loaded_vmcs_clear,
because this function isn't necessarily used to "init" - it's also called to
VMCLEAR an old vmcs (and flush its content back to memory) - in that sense
it is definitely not a "vmcs_init".

Unfortunately, I already have a whole chain of functions with this name :(
the existing loaded_vmcs_clear() does an IPI to the CPU which has this VMCS
loaded, and causes it to run __loaded_vmcs_clear(), which in turn calls
the above loaded_vmcs_init(). I wish I could call all three functions
loaded_vmcs_clear(), but of course I can't. If anyone reading this has a good
suggestion on how to name these three functions, please let me know.

> > +static void __loaded_vmcs_clear(void *arg)
> >  {
> > -	struct vcpu_vmx *vmx = arg;
> > +	struct loaded_vmcs *loaded_vmcs = arg;
> >  	int cpu = raw_smp_processor_id();
> > 
> > -	if (vmx->vcpu.cpu == cpu)
> > -		vmcs_clear(vmx->vmcs);
> > -	if (per_cpu(current_vmcs, cpu) == vmx->vmcs)
> > +	if (loaded_vmcs->cpu != cpu)
> > +		return; /* cpu migration can race with cpu offline */
> 
> what do you mean by "cpu migration" here? why does 'cpu offline'
> matter here regarding to the cpu change?

__loaded_vmcs_clear() is typically called in one of two cases: "cpu migration"
means that a guest that used to run on one CPU, and had its VMCS loaded there,
suddenly needs to run on a different CPU, so we need to clear the VMCS on
the old CPU. "cpu offline" means that we want to take a certain CPU offline,
and before we do that we should VMCLEAR all VMCSs which were loaded on it.

The (vmx->cpu.cpu != cpu) case in __loaded_vmcs_clear should ideally never
happen: In the cpu offline path, we only call it for the loaded_vmcss which
we know for sure are loaded on the current cpu. In the cpu migration path,
loaded_vmcs_clear runs __loaded_vmcs_clear on the right CPU, which ensures that
equality.

But, there can be a race condition (this was actually explained to me a while
back by Avi - I never seen this happening in practice): Imagine that cpu
migration calls loaded_vmcs_clear, which tells the old cpu (via IPI) to
VMCLEAR this vmcs. But before that old CPU gets a chance to act on that IPI,
a decision is made to take it offline, and all loaded_vmcs loaded on it
(including the one in question) are cleared. When that CPU acts on this IPI,
it notices that vmx->cpu.cpu==-1, i.e., != cpu, so it doesn't need to do
anything (in the new version of the code, I made this more explicit, by
returning immediately in this case).

At least this is the theory. As I said, I didn't see this problem in practice
(unsurprising, since I never offlined any CPU). Maybe Avi or someone else can
comment more about this (vmx->cpu.cpu == cpu) check, which existed before
my patch - in __vcpu_clear().

> > +static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
> > +{
> > +	if (!loaded_vmcs->vmcs)
> > +		return;
> > +	loaded_vmcs_clear(loaded_vmcs);
> > +	free_vmcs(loaded_vmcs->vmcs);
> > +	loaded_vmcs->vmcs = NULL;
> > +}
> 
> prefer to not do cleanup work through loaded_vmcs since it's just a pointer
> to a loaded vmcs structure. Though you can carefully arrange the nested
> vmcs cleanup happening before it, it's not very clean and a bit error prone
> simply from this function itself. It's clearer to directly cleanup vmcs01, and
> if you want an assertion could be added to make sure loaded_vmcs doesn't
> point to any stale vmcs02 structure after nested cleanup step.	

I'm afraid I didn't understand what you meant here... Basically, this
free_loaded_vmcs() is just a shortcut for loaded_vmcs_clear() and free_vmcs(),
as doing both is needed in 3 places: nested_free_vmcs02,
nested_free_all_saved_vmcss, vmx_free_vcpu. The same function is needed
for both vmcs01 and vmcs02 VMCSs - in both cases when we don't need them any
more we need to VMCLEAR them and then free the VMCS memory. Note that this
function does *not* free the loaded_vmcs structure itself.

What's wrong with this?
Would you prefer that I remove this function and explictly call
loaded_vmcs_clear() and then free_vmcs() in all three places?

Thanks,
Nadav.

-- 
Nadav Har'El                        |      Tuesday, May 24 2011, 20 Iyyar 5771
nyh@xxxxxxxxxxxxxxxxxxx             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Linux: Because a PC is a terrible thing
http://nadav.harel.org.il           |to waste.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html