Re: KVM/ARM status and branches

Alexander Graf <agraf@xxxxxxx> · Mon, 10 Sep 2012 22:59:33 +0200

On 10.09.2012, at 22:07, Christoffer Dall <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> On Mon, Sep 10, 2012 at 4:04 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>> On Mon, 10 Sep 2012 10:32:04 -0400, Christoffer Dall
>> <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>> On Mon, Sep 10, 2012 at 6:18 AM, Marc Zyngier <marc.zyngier@xxxxxxx>
>> wrote:
>>>> On 10/09/12 05:04, Christoffer Dall wrote:
>>>>> Hello,
>>>>> 
>>>>> We have a new branch, which will never be rebased and should always be
>>>>> bisectable and mergable. It's kvm-arm-master and can be found here:
>>>>> 
>>>>> git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-master
>>>>> 
>>>>> (or pointy-clicky web interface:)
>>>>> https://github.com/virtualopensystems/linux-kvm-arm
>>>>> 
>>>>> This branch merges 3.6-rc5
>>>>> 
>>>>> The branch also merges all Marc Zyngier's timer, vgic and hyp-mode
>>>>> boot branches.
>>>>> 
>>>>> It is also merged with the IRQ injection API changes (touched
>>>>> KVM_IRQ_LINE) as there hasn't been any other comments on this. This
>>>>> requires qemu patches, which can be found here:
>>>>> 
>>>>> git://github.com/virtualopensystems/qemu.git kvm-arm-irq-api
>>>>> 
>>>>> (or pointy-clicky web interface:)
>>>>> https://github.com/virtualopensystems/qemu
>>>>> 
>>>>> Two things are outstanding on my end before I attempt an initial
>>>>> upstream;
>>>>> 1. We have a bug when we start swapping in the host, the guest kernel
>>>>> dies with "BUG: Bad page state..." and all sort of bad things follow.
>>>>> If we really stress the host on memory pressure it seems that host can
>>>>> also crash, or at least become completely unresponsive. The same test
>>>>> on a KVM kernel without any VMs does not cause this BUG.
>>>> 
>>>> Is that the one you're seeing?
>>>> 
>>>> [  312.189234] ------------[ cut here ]------------
>>>> [  312.203056] kernel BUG at arch/arm/kvm/mmu.c:382!
>>>> [  312.217134] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP THUMB2
>>>> [  312.235376] Modules linked in:
>>>> [  312.244515] CPU: 0    Not tainted  (3.6.0-rc3+ #40)
>>>> [  312.259118] PC is at stage2_clear_pte+0x128/0x134
>>>> [  312.273193] LR is at kvm_unmap_hva+0x97/0xa0
>>>> [  312.285967] pc : [<c001e10c>]    lr : [<c001ee0f>]    psr: 60000133
>>>> [  312.285967] sp : caa25998  ip : df97a028  fp : 00800000
>>>> [  312.320355] r10: 873b5b5f  r9 : c8654000  r8 : 01c55000
>>>> [  312.335990] r7 : 00000000  r6 : df249c00  r5 : c688fb80  r4 :
>> df249ccc
>>>> [  312.355532] r3 : 00000000  r2 : 2e001000  r1 : 00000000  r0 :
>> 00000000
>>>> [  312.375076] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb
>>>> Segment user
>>>> [  312.396962] Control: 70c5387d  Table: 8a9bbb00  DAC: fffffffd
>>>> [  312.414161] Process hackbench (pid: 7207, stack limit = 0xcaa242f8)
>>>> 
>>> 
>>> FYI, this is what I'm seeing in the guest in more details (this
>>> couldn't be the icache stuff could it?):
>> 
>> [...]
>> 
>> I do see similar things - and some others. It is really random.
>> 
>> I tried nuking the icache without any success. I spent the whole day
>> adding flushes on every code paths, without making a real difference. And
>> the more I think of it, the more I'm convinced that this is caused by the
>> way we manipulate pages without telling the kernel what we're actually
>> doing.
>> 
>> What happens is that as far as the kernel is concerned, the qemu pages are
>> always clean. We never flag a page dirty, because it is the guest that
>> performs the write, and we're completely oblivious of that path. What I
>> think happens is that the guest writes some data to the cache (or even to
>> memory) and the underlying page gets evicted without being sync-ed first,
>> because nobody knows it's been modified.
>> 
>> If my gut feeling is true, we need to tell the kernel that as soon as a
>> page is inserted in stage-2, it is assumed to be dirty. We could always
>> mark them read-only and resolve the fault at a later time, but that isn't
>> important at the moment. And we need to flag it in the qemu mapping,
>> because it is the one being evicted.
>> 
>> What do you think?
>> 
> I think this is definitely a good bet, I remember Alex Graf saying
> something about KVM taking care of the dirty bit for us, but I'm not
> sure.

There is a kvm helper function to mark a gfn dirty. You need to call that one :).

> 
> We already mark pages read-only if that makes sense, so we could avoid
> setting the dirty bit there.

You may want to use the dirty bit for vga diety bitmap information, so yes, doing it on demand makes sense.

Alex

> 
> I will try this out right away.
> 
> -Christoffer

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm