Re: [PATCH] kvm: make vcpu life cycle separated from kvm instance

Liu ping fan <kernelfans@xxxxxxxxx> · Tue, 6 Dec 2011 14:54:06 +0800

On Mon, Dec 5, 2011 at 4:41 PM, Gleb Natapov <gleb@xxxxxxxxxx> wrote:
> On Mon, Dec 05, 2011 at 01:39:37PM +0800, Liu ping fan wrote:
>> On Sun, Dec 4, 2011 at 8:10 PM, Gleb Natapov <gleb@xxxxxxxxxx> wrote:
>> > On Sun, Dec 04, 2011 at 07:53:37PM +0800, Liu ping fan wrote:
>> >> On Sat, Dec 3, 2011 at 2:26 AM, Jan Kiszka <jan.kiszka@xxxxxxxxxxx> wrote:
>> >> > On 2011-12-02 07:26, Liu Ping Fan wrote:
>> >> >> From: Liu Ping Fan <pingfank@xxxxxxxxxxxxxxxxxx>
>> >> >>
>> >> >> Currently, vcpu can be destructed only when kvm instance destroyed.
>> >> >> Change this to vcpu's destruction taken when its refcnt is zero,
>> >> >> and then vcpu MUST and CAN be destroyed before kvm's destroy.
>> >> >
>> >> > I'm lacking the big picture yet (would be good to have in the change log
>> >> > - at least I'm too lazy to read the code):
>> >> >
>> >> > What increments the refcnt, what decrements it again? IOW, how does user
>> >> > space controls the life-cycle of a vcpu after your changes?
>> >> >
>> >> In local APIC mode, delivering IPI to target APIC, target's refcnt is
>> >> incremented, and decremented when finished. At other times, using RCU to
>> > Why is this needed?
>> >
>> Suppose the following scene:
>>
>> #define kvm_for_each_vcpu(idx, vcpup, kvm) \
>>         for (idx = 0; \
>>              idx < atomic_read(&kvm->online_vcpus) && \
>>              (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
>>              idx++)
>>
>> ------------------------------------------------------------------------------------------>
>> Here kvm_vcpu's destruction is called
>>               vcpup->vcpu_id ...  //oops!
>>
>>
> And this is exactly how your code looks. i.e you do not increment
> reference count in most of the loops, you only increment it twice
> (in pic_unlock() and kvm_irq_delivery_to_apic()) because you are using
> vcpu outside of rcu_read_lock() protected section and I do not see why
> not just extend protected section to include kvm_vcpu_kick(). As far as
> I can see this function does not sleep.
>
:-), I just want to minimize the RCU critical area, and as you say, we
can  extend protected section to include kvm_vcpu_kick()

> What should protect vcpu from disappearing in your example above is RCU
> itself if you are using it right. But since I do not see any calls to
> rcu_assign_pointer()/rcu_dereference() I doubt you are using it right
> actually.
>
Sorry, but I thought it would not be. Please help me to check my thoughts :

struct kvm_vcpu *kvm_vcpu_get(struct kvm_vcpu *vcpu)
{
	if (vcpu == NULL)
		return NULL;
	if (atomic_add_unless(&vcpu->refcount, 1, 0))
------------------------------increment
		return vcpu;
	return NULL;
}

void kvm_vcpu_put(struct kvm_vcpu *vcpu)
{
	struct kvm *kvm;
	if (atomic_dec_and_test(&vcpu->refcount)) {
--------------------------decrement
		kvm = vcpu->kvm;
		mutex_lock(&kvm->lock);
		kvm->vcpus[vcpu->vcpu_id] = NULL;
		atomic_dec(&kvm->online_vcpus);
		mutex_unlock(&kvm->lock);
		call_rcu(&vcpu->head, kvm_vcpu_zap);
	}
}

The atomic of decrement and increment are protected by cache coherent protocol.
So once we hold a valid kvm_vcpu pointer through kvm_vcpu_get(),
we will always keep it until we release it, then, the destruction may happen.

Thanks and regards,
ping fan

> --
>                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html