Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
> The current code suffers from the following race condition:
> 
> thread-1                                    thread-2
> -----------------------------------------------------------
> 
> kvm_set_irq() {
>    rcu_read_lock()
>    irq_rt = rcu_dereference(table);
>    rcu_read_unlock();
> 
>                                        kvm_set_irq_routing() {
>                                           mutex_lock();
>                                           irq_rt = table;
>                                           rcu_assign_pointer();
>                                           mutex_unlock();
>                                           synchronize_rcu();
> 
>                                           kfree(irq_rt);
> 
>    irq_rt->entry->set(); /* bad */
> 
> -------------------------------------------------------------
> 
> Because the pointer is accessed outside of the read-side critical
> section.  There are two basic patterns we can use to fix this bug:
> 
> 1) Switch to sleeping-rcu and encompass the ->set() access within the
>    read-side critical section,
> 
>    OR
> 
> 2) Add reference counting to the irq_rt structure, and simply acquire
>    the reference from within the RSCS.
> 
> This patch implements solution (1).

Looks like a good transformation!  A few questions interspersed below.

> Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx>
> ---
> 
>  include/linux/kvm_host.h |    6 +++++-
>  virt/kvm/irq_comm.c      |   50 +++++++++++++++++++++++++++-------------------
>  virt/kvm/kvm_main.c      |    1 +
>  3 files changed, 35 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index bd5a616..1fe135d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -185,7 +185,10 @@ struct kvm {
> 
>  	struct mutex irq_lock;
>  #ifdef CONFIG_HAVE_KVM_IRQCHIP
> -	struct kvm_irq_routing_table *irq_routing;
> +	struct {
> +		struct srcu_struct            srcu;

Each structure has its own SRCU domain.  This is OK, but just asking
if that is the intent.  It does look like the SRCU primitives are
passed a pointer to the correct structure, and that the return value
from srcu_read_lock() gets passed into the matching srcu_read_unlock()
like it needs to be, so that is good.

> +		struct kvm_irq_routing_table *table;
> +	} irq_routing;
>  	struct hlist_head mask_notifier_list;
>  	struct hlist_head irq_ack_notifier_list;
>  #endif

[ . . . ]

> @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
>  	 * IOAPIC.  So set the bit in both. The guest will ignore
>  	 * writes to the unused one.
>  	 */
> -	rcu_read_lock();
> -	irq_rt = rcu_dereference(kvm->irq_routing);
> +	idx = srcu_read_lock(&kvm->irq_routing.srcu);
> +	irq_rt = rcu_dereference(kvm->irq_routing.table);
>  	if (irq < irq_rt->nr_rt_entries)
> -		hlist_for_each_entry(e, n, &irq_rt->map[irq], link)
> -			irq_set[i++] = *e;
> -	rcu_read_unlock();
> +		hlist_for_each_entry(e, n, &irq_rt->map[irq], link) {

What prevents the above list from changing while we are traversing it?
(Yes, presumably whatever was preventing it from changing before this
patch, but what?)

Mostly kvm->lock is held, but not always.  And if kvm->lock were held
all the time, there would be no point in using SRCU.  ;-)

> +			int r;
> 
> -	while(i--) {
> -		int r;
> -		r = irq_set[i].set(&irq_set[i], kvm, irq_source_id, level);
> -		if (r < 0)
> -			continue;
> +			r = e->set(e, kvm, irq_source_id, level);
> +			if (r < 0)
> +				continue;
> 
> -		ret = r + ((ret < 0) ? 0 : ret);
> -	}
> +			ret = r + ((ret < 0) ? 0 : ret);
> +		}
> +	srcu_read_unlock(&kvm->irq_routing.srcu, idx);
> 
>  	return ret;
>  }
> @@ -179,17 +178,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  	struct kvm_irq_ack_notifier *kian;
>  	struct hlist_node *n;
>  	int gsi;
> +	int idx;
> 
>  	trace_kvm_ack_irq(irqchip, pin);
> 
> -	rcu_read_lock();
> -	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
> +	idx = srcu_read_lock(&kvm->irq_routing.srcu);
> +	gsi = rcu_dereference(kvm->irq_routing.table)->chip[irqchip][pin];
>  	if (gsi != -1)
>  		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
>  					 link)

And same question here -- what keeps the above list from changing while
we are traversing it?

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux