Re: [PATCH] KVM: arm64: arch_timer shouldn't assume the vcpu is loaded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi James,

On Wed, 8 Apr 2020 12:16:01 +0100
James Morse <james.morse@xxxxxxx> wrote:

> Hi Marc,
> 
> On 08/04/2020 11:07, Marc Zyngier wrote:
> > On Mon,  6 Apr 2020 16:03:55 +0100
> > James Morse <james.morse@xxxxxxx> wrote:
> >   
> >> kvm_arch_timer_get_input_level() needs to get the arch_timer_context for
> >> a particular vcpu, and uses kvm_get_running_vcpu() to find it.
> >>
> >> kvm_arch_timer_get_input_level() may be called to handle a user-space
> >> write to the redistributor, where the vcpu is not loaded. This causes
> >> kvm_get_running_vcpu() to return NULL:
> >> | Unable to handle kernel paging request at virtual address 0000000000001ec0
> >> | Mem abort info:
> >> |   ESR = 0x96000004
> >> |   EC = 0x25: DABT (current EL), IL = 32 bits
> >> |   SET = 0, FnV = 0
> >> |   EA = 0, S1PTW = 0
> >> | Data abort info:
> >> |   ISV = 0, ISS = 0x00000004
> >> |   CM = 0, WnR = 0
> >> | user pgtable: 4k pages, 48-bit VAs, pgdp=000000003cbf9000
> >> | [0000000000001ec0] pgd=0000000000000000
> >> | Internal error: Oops: 96000004 [#1] PREEMPT SMP
> >> | Modules linked in: r8169 realtek efivarfs ip_tables x_tables
> >> | CPU: 1 PID: 2615 Comm: qemu-system-aar Not tainted 5.6.0-rc7 #30
> >> | Hardware name: Marvell mvebu_armada-37xx/mvebu_armada-37xx, BIOS 2018.03-devel-18.12.3-gc9aa92c-armbian 02/20/2019
> >> | pstate: 00000085 (nzcv daIf -PAN -UAO)
> >> | pc : kvm_arch_timer_get_input_level+0x1c/0x68
> >> | lr : kvm_arch_timer_get_input_level+0x1c/0x68
> >>
> >> | Call trace:
> >> |  kvm_arch_timer_get_input_level+0x1c/0x68
> >> |  vgic_get_phys_line_level+0x3c/0x90
> >> |  vgic_mmio_write_senable+0xe4/0x130
> >> |  vgic_uaccess+0xe0/0x100
> >> |  vgic_v3_redist_uaccess+0x5c/0x80
> >> |  vgic_v3_attr_regs_access+0xf0/0x200
> >> |  nvgic_v3_set_attr+0x234/0x250
> >> |  kvm_device_ioctl_attr+0xa4/0xf8
> >> |  kvm_device_ioctl+0x7c/0xc0
> >> |  ksys_ioctl+0x1fc/0xc18
> >> |  __arm64_sys_ioctl+0x24/0x30
> >> |  do_el0_svc+0x7c/0x148
> >> |  el0_sync_handler+0x138/0x258
> >> |  el0_sync+0x140/0x180
> >> | Code: 910003fd f9000bf3 2a0003f3 97ff650c (b95ec001)
> >> | ---[ end trace 81287612d93f1e70 ]---
> >> | note: qemu-system-aar[2615] exited with preempt_count 1
> >>
> >> Loading the vcpu doesn't make a lot of sense for handling a device ioctl(),
> >> so instead pass the vcpu through to kvm_arch_timer_get_input_level(). Its
> >> not clear that an intid makes much sense without the paired vcpu.  
> > 
> > I don't fully agree with the analysis, Remember we are looking at the
> > state of the physical interrupt associated with a virtual interrupt, so
> > the vcpu doesn't quite make sense here if it isn't loaded.
> > 
> > What does it mean to look at the HW timer when we are not in the right
> > context? For all we know, it is completely random (the only guarantee
> > we have is that it is disabled, actually).  
> 
> > My gut feeling is that this is another instance where we should provide
> > specific userspace accessors that would only deal with the virtual
> > state, and leave anything that deals with the physical state of the
> > interrupt to be exercised only by the guest.  
> 
> > Does it make sense?  
> 
> Broadly, yes. Specifically ... I'm not familiar enough with this code to work out where
> such a change should go!
> 
> ~20 mins of grepping later~
> 
> Remove REGISTER_DESC_WITH_LENGTH() so that uaccess helpers have to be provided, and forbid
> NULL for the ur/uw values in REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED()...?

I'd suggest something like this (untested, of course):

diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index d63881f60e1a..f51c6e939c76 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -409,10 +409,12 @@ static const struct vgic_register_region vgic_v2_dist_registers[] = {
 		NULL, vgic_mmio_uaccess_write_v2_group, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_SET,
-		vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1,
+		vgic_mmio_read_enable, vgic_mmio_write_senable,
+		NULL, vgic_uaccess_write_senable, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_CLEAR,
-		vgic_mmio_read_enable, vgic_mmio_write_cenable, NULL, NULL, 1,
+		vgic_mmio_read_enable, vgic_mmio_write_cenable,
+		NULL, vgic_uaccess_write_cenable, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_SET,
 		vgic_mmio_read_pending, vgic_mmio_write_spending, NULL, NULL, 1,
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 77c8ba1a2535..a9c45048fadb 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -538,10 +538,12 @@ static const struct vgic_register_region vgic_v3_dist_registers[] = {
 		vgic_mmio_read_group, vgic_mmio_write_group, NULL, NULL, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISENABLER,
-		vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1,
+		vgic_mmio_read_enable, vgic_mmio_write_senable,
+		NULL, vgic_uaccess_write_senable, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICENABLER,
-		vgic_mmio_read_enable, vgic_mmio_write_cenable, NULL, NULL, 1,
+		vgic_mmio_read_enable, vgic_mmio_write_cenable,
+	       NULL, vgic_uaccess_write_cenable, 1,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISPENDR,
 		vgic_mmio_read_pending, vgic_mmio_write_spending,
@@ -609,11 +611,13 @@ static const struct vgic_register_region vgic_v3_rd_registers[] = {
 	REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_IGROUPR0,
 		vgic_mmio_read_group, vgic_mmio_write_group, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ISENABLER0,
-		vgic_mmio_read_enable, vgic_mmio_write_senable, 4,
+	REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISENABLER0,
+		vgic_mmio_read_enable, vgic_mmio_write_senable,
+		NULL, vgic_uaccess_write_senable, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ICENABLER0,
-		vgic_mmio_read_enable, vgic_mmio_write_cenable, 4,
+	REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ICENABLER0,
+		vgic_mmio_read_enable, vgic_mmio_write_cenable,
+		NULL, vgic_uaccess_write_cenable, 4,
 		VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISPENDR0,
 		vgic_mmio_read_pending, vgic_mmio_write_spending,
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index 4012cd68ac93..2ca11b05b17b 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -184,6 +184,48 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
 	}
 }
 
+int vgic_uaccess_write_senable(struct kvm_vcpu *vcpu,
+			       gpa_t addr, unsigned int len,
+			       unsigned long val)
+{
+	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
+	int i;
+	unsigned long flags;
+
+	for_each_set_bit(i, &val, len * 8) {
+		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
+
+		raw_spin_lock_irqsave(&irq->irq_lock, flags);
+		irq->enabled = true;
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
+
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+
+	return 0;
+}
+
+int vgic_uaccess_write_cenable(struct kvm_vcpu *vcpu,
+			       gpa_t addr, unsigned int len,
+			       unsigned long val)
+{
+	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
+	int i;
+	unsigned long flags;
+
+	for_each_set_bit(i, &val, len * 8) {
+		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
+
+		raw_spin_lock_irqsave(&irq->irq_lock, flags);
+		irq->enabled = false;
+		raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+
+	return 0;
+}
+
 unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
 				     gpa_t addr, unsigned int len)
 {
diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
index 30713a44e3fa..327d0a6938e4 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.h
+++ b/virt/kvm/arm/vgic/vgic-mmio.h
@@ -138,6 +138,14 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
 			     gpa_t addr, unsigned int len,
 			     unsigned long val);
 
+int vgic_uaccess_write_senable(struct kvm_vcpu *vcpu,
+			       gpa_t addr, unsigned int len,
+			       unsigned long val);
+
+int vgic_uaccess_write_cenable(struct kvm_vcpu *vcpu,
+			       gpa_t addr, unsigned int len,
+			       unsigned long val);
+
 unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
 				     gpa_t addr, unsigned int len);
 


> 
> Or if that is too invasive, something like, (totally, untested):
> ----------------%<----------------
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index 97fb2a40e6ba..30ae5f29e429 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -113,10 +113,11 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>                 struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> 
>                 raw_spin_lock_irqsave(&irq->irq_lock, flags);
> -               if (vgic_irq_is_mapped_level(irq)) {
> +               if (kvm_running_vcpu() && vgic_irq_is_mapped_level(irq)) {
>                         bool was_high = irq->line_level;
> 
>                         /*
> +                        * Unless we are running due to a user-space access,
>                          * We need to update the state of the interrupt because
>                          * the guest might have changed the state of the device
>                          * while the interrupt was disabled at the VGIC level.
> ----------------%<----------------

Huh, nice try! ;-) Unfortunately, I think there is more than the enable
register that suffers from a similar problem (see how the pending
register write is also accessing the HW state, even if accessed from
userspace).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux