Re: KVM/ARM status and branches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/09/12 05:08, Christoffer Dall wrote:
> On Mon, Sep 10, 2012 at 4:59 PM, Alexander Graf <agraf@xxxxxxx> wrote:
>>
>>
>> On 10.09.2012, at 22:07, Christoffer Dall <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>> On Mon, Sep 10, 2012 at 4:04 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>>>> On Mon, 10 Sep 2012 10:32:04 -0400, Christoffer Dall
>>>> <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>>> On Mon, Sep 10, 2012 at 6:18 AM, Marc Zyngier <marc.zyngier@xxxxxxx>
>>>> wrote:
>>>>>> On 10/09/12 05:04, Christoffer Dall wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> We have a new branch, which will never be rebased and should always be
>>>>>>> bisectable and mergable. It's kvm-arm-master and can be found here:
>>>>>>>
>>>>>>> git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-master
>>>>>>>
>>>>>>> (or pointy-clicky web interface:)
>>>>>>> https://github.com/virtualopensystems/linux-kvm-arm
>>>>>>>
>>>>>>> This branch merges 3.6-rc5
>>>>>>>
>>>>>>> The branch also merges all Marc Zyngier's timer, vgic and hyp-mode
>>>>>>> boot branches.
>>>>>>>
>>>>>>> It is also merged with the IRQ injection API changes (touched
>>>>>>> KVM_IRQ_LINE) as there hasn't been any other comments on this. This
>>>>>>> requires qemu patches, which can be found here:
>>>>>>>
>>>>>>> git://github.com/virtualopensystems/qemu.git kvm-arm-irq-api
>>>>>>>
>>>>>>> (or pointy-clicky web interface:)
>>>>>>> https://github.com/virtualopensystems/qemu
>>>>>>>
>>>>>>> Two things are outstanding on my end before I attempt an initial
>>>>>>> upstream;
>>>>>>> 1. We have a bug when we start swapping in the host, the guest kernel
>>>>>>> dies with "BUG: Bad page state..." and all sort of bad things follow.
>>>>>>> If we really stress the host on memory pressure it seems that host can
>>>>>>> also crash, or at least become completely unresponsive. The same test
>>>>>>> on a KVM kernel without any VMs does not cause this BUG.
>>>>>>
>>>>>> Is that the one you're seeing?
>>>>>>
>>>>>> [  312.189234] ------------[ cut here ]------------
>>>>>> [  312.203056] kernel BUG at arch/arm/kvm/mmu.c:382!
>>>>>> [  312.217134] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP THUMB2
>>>>>> [  312.235376] Modules linked in:
>>>>>> [  312.244515] CPU: 0    Not tainted  (3.6.0-rc3+ #40)
>>>>>> [  312.259118] PC is at stage2_clear_pte+0x128/0x134
>>>>>> [  312.273193] LR is at kvm_unmap_hva+0x97/0xa0
>>>>>> [  312.285967] pc : [<c001e10c>]    lr : [<c001ee0f>]    psr: 60000133
>>>>>> [  312.285967] sp : caa25998  ip : df97a028  fp : 00800000
>>>>>> [  312.320355] r10: 873b5b5f  r9 : c8654000  r8 : 01c55000
>>>>>> [  312.335990] r7 : 00000000  r6 : df249c00  r5 : c688fb80  r4 :
>>>> df249ccc
>>>>>> [  312.355532] r3 : 00000000  r2 : 2e001000  r1 : 00000000  r0 :
>>>> 00000000
>>>>>> [  312.375076] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb
>>>>>> Segment user
>>>>>> [  312.396962] Control: 70c5387d  Table: 8a9bbb00  DAC: fffffffd
>>>>>> [  312.414161] Process hackbench (pid: 7207, stack limit = 0xcaa242f8)
>>>>>>
>>>>>
>>>>> FYI, this is what I'm seeing in the guest in more details (this
>>>>> couldn't be the icache stuff could it?):
>>>>
>>>> [...]
>>>>
>>>> I do see similar things - and some others. It is really random.
>>>>
>>>> I tried nuking the icache without any success. I spent the whole day
>>>> adding flushes on every code paths, without making a real difference. And
>>>> the more I think of it, the more I'm convinced that this is caused by the
>>>> way we manipulate pages without telling the kernel what we're actually
>>>> doing.
>>>>
>>>> What happens is that as far as the kernel is concerned, the qemu pages are
>>>> always clean. We never flag a page dirty, because it is the guest that
>>>> performs the write, and we're completely oblivious of that path. What I
>>>> think happens is that the guest writes some data to the cache (or even to
>>>> memory) and the underlying page gets evicted without being sync-ed first,
>>>> because nobody knows it's been modified.
>>>>
>>>> If my gut feeling is true, we need to tell the kernel that as soon as a
>>>> page is inserted in stage-2, it is assumed to be dirty. We could always
>>>> mark them read-only and resolve the fault at a later time, but that isn't
>>>> important at the moment. And we need to flag it in the qemu mapping,
>>>> because it is the one being evicted.
>>>>
>>>> What do you think?
>>>>
>>> I think this is definitely a good bet, I remember Alex Graf saying
>>> something about KVM taking care of the dirty bit for us, but I'm not
>>> sure.
>>
>> There is a kvm helper function to mark a gfn dirty. You need to call that one :).
>>
>>>
>>> We already mark pages read-only if that makes sense, so we could avoid
>>> setting the dirty bit there.
>>
>> You may want to use the dirty bit for vga diety bitmap information, so yes, doing it on demand makes sense.
>>
> 
> 
> thanks, so it turns out that *both* fixes were needed, so we were in
> fact seeing the infamous icache bug, and of course the dcache bug was
> going to blow things up as well.
> 
> tested with three VMs, all using 400MB (/dev/random > <ramfs>/foo),
> all running cyclictest and hackbench, pressuring the memory on host,
> see a lot of swapping happening, everything still works, then I ran
> KSM on there, didn't amount to all that much, thing still working,
> replaced all the /dev/random stuff with /dev/zero, ran KSM again, saw
> all the memory being swapped back in and being freed, wrote a few
> pages in the ramfs files in two guests to break COW, everything still
> running beautifully for more than 45 minutes. I'm happy at this point,
> see patch in separate e-mail.

Even with this patch, I can make the VM fall over. If you want to
reproduce my setup:

- Boot TC2 with mem=512MB of RAM.
- Activate some swap
- Run "hackbench 100 process 1000" on the host, preferably in a loop.
This already requires more than 512MB of RAM.
- Start a VM (a full Linaro install in my case), booting off MMC
emulation, with 480MB of RAM (it may take a long while to start)
- If you manage to reach a prompt, run "hackbench 70 process 1000" in
the VM.

It eventually dies a painful death.

I found a couple of problems with that patch:
- The kvm_release_pfn_dirty() call must be in the critical section,
otherwise you have a window where the page can be modified and evicted
before being marked dirty.
- Relying on the page being writeable or not seems to be the core
problem. Imagine the following situation:
  * Page is mapped writeable to load some executable -> dirty
  * Page is swapped out
  * Page is swapped back in due to instruction fetch -> !dirty
  * Page evicted again, code is terminally lost
  * Page faulted in again, kaboom.

With the attached patch, the above test (slowly) ran to completion. I'm
not sure my analysis is completely sound, but the patch definitely fixes
something.

	M.
-- 
Jazz is not dead. It just smells funny...
>From ecdf712d22b9845238a1d9eaed0e55805b11aa0c Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier@xxxxxxx>
Date: Tue, 11 Sep 2012 13:11:31 +0100
Subject: [PATCH] ARM: KVM: Mark faulted-in page unconditionally dirty

Relying on the page being writeable or not seems to be a problem.
Imagine the following situation:
  * Page is mapped writeable to load some executable -> dirty
  * Page is swapped out
  * Page is swapped back in due to instruction fetch -> !dirty
  * Page evicted again, code is terminally lost
  * Page faulted in again, kaboom.

Marking the page dirty as soon as it is mappped fixes this.

Also move the kvm_release_pfn_dirty() call into the critical section,
so the page cannot be evicted before being marked dirty.

Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
---
 arch/arm/kvm/mmu.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 99a2b61..9ae839c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -555,13 +555,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	spin_lock(&vcpu->kvm->arch.pgd_lock);
 	stage2_set_pte(vcpu->kvm, memcache, fault_ipa, &new_pte);
+	kvm_release_pfn_dirty(pfn);
 	spin_unlock(&vcpu->kvm->arch.pgd_lock);
+	return 0;
 
 out:
-	if (writable && !ret)
-		kvm_release_pfn_dirty(pfn);
-	else
-		kvm_release_pfn_clean(pfn);
+	put_page(pfn_to_page(pfn));
 
 	return ret;
 }
-- 
1.7.12
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux