Re: Intermittent guest kernel crashes with v4.5-rc6.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/03/16 14:26, Shanker Donthineni wrote:
> 
> 
> On 03/03/2016 08:03 AM, Marc Zyngier wrote:
>> On 03/03/16 13:25, Shanker Donthineni wrote:
>>>
>>> On 03/02/2016 11:35 AM, Marc Zyngier wrote:
>>>> On 02/03/16 15:48, Shanker Donthineni wrote:
>>>>
>>>>> We haven't started running heavy workloads in VMs. So far we
>>>>> have noticed this random nature behavior only during guest
>>>>> kernel boot (at EL1).  
>>>>>
>>>>> We didn't see this problem on 4.3 kernel. Do you think it is
>>>>> related to TLB conflicts?
>>>> I cannot imagine why a DSB would solve a TLB conflict. But the fact that
>>>> you didn't see it crashing on 4.3 is a good indication that something
>>>> else it at play.
>>>>
>>>> In 4.5, we've rewritten a large part of KVM in C, which has changed the
>>>> ordering of the various accesses a lot. It could be that a latent
>>>> problem is now exposed more widely.
>>>>
>>>> Can you try moving this DSB around and find out what is the earliest
>>>> point where it solves this problem? Some sort of bisection?
>>> The maximum I can move up 'dsb ishst' to the beginning of
>>> __guest_enter() but not out side of this function.
>>>
>>> I don't understand why it is failing below code, branch
>>> instruction causing problems.
>>>
>>>     /* Jump in the fire! */
>>> +  dsb(ishst);
>>>     exit_code = __guest_enter(vcpu, host_ctxt);
>>>     /* And we're baaack! */
>> That's very worrying. I can't see how the branch can have an influence
>> on the the DSB (nor why the DSB has an influence on the rest of the
>> execution, btw).
>>
>> What if you replace the DSB with an ISB? Do you observe a similar
>> behaviour (works if the barrier is in __guest_enter, but not if it is
>> outside)?
> I have already tried with isb without success. I did another
> experiment flush stage-2 TLBs before calling __guest_enetr(),
> it fixed the problem.

I suspected something like that. But it is such a massive hammer that it
will hide any sort of subtle bug (HW *and* SW).

> 
>> Another thing worth looking at is what happened just before we decided
>> to get back into the guest. Or to put it differently, what was the
>> reason to exit the first place. Was it a Stage-2 fault by any chance?
> 
> I will collect as much possible debug data and update results
> to you. I went through your KVM refracted 'C' code and did not
> find any thing suspicious. I am thinking may be Qualcomm CPUs
> have a very aggressive prefech logic that causing the problem. 

OK. Please keep me posted about your findings. Also maybe involving some
HW people ouwld be a good idea (running something in an emulator, for
example...).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux