Re: Intermittent guest kernel crashes with v4.5-rc6.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/07/2016 10:36 PM, Shanker Donthineni wrote:
> On 03/03/2016 08:38 AM, Marc Zyngier wrote:
>> On 03/03/16 14:26, Shanker Donthineni wrote:
>>> On 03/03/2016 08:03 AM, Marc Zyngier wrote:
>>>> On 03/03/16 13:25, Shanker Donthineni wrote:
>>>>> On 03/02/2016 11:35 AM, Marc Zyngier wrote:
>>>>>> On 02/03/16 15:48, Shanker Donthineni wrote:
>>>>>>
>>>>>>> We haven't started running heavy workloads in VMs. So far we
>>>>>>> have noticed this random nature behavior only during guest
>>>>>>> kernel boot (at EL1).
>>>>>>>
>>>>>>> We didn't see this problem on 4.3 kernel. Do you think it is
>>>>>>> related to TLB conflicts?
>>>>>> I cannot imagine why a DSB would solve a TLB conflict. But the fact 
>>>>>> that
>>>>>> you didn't see it crashing on 4.3 is a good indication that something
>>>>>> else it at play.
>>>>>>
>>>>>> In 4.5, we've rewritten a large part of KVM in C, which has changed the
>>>>>> ordering of the various accesses a lot. It could be that a latent
>>>>>> problem is now exposed more widely.
>>>>>>
>>>>>> Can you try moving this DSB around and find out what is the earliest
>>>>>> point where it solves this problem? Some sort of bisection?
>>>>> The maximum I can move up 'dsb ishst' to the beginning of
>>>>> __guest_enter() but not out side of this function.
>>>>>
>>>>> I don't understand why it is failing below code, branch
>>>>> instruction causing problems.
>>>>>
>>>>>     /* Jump in the fire! */
>>>>> +  dsb(ishst);
>>>>>     exit_code = __guest_enter(vcpu, host_ctxt);
>>>>>     /* And we're baaack! */
>>>> That's very worrying. I can't see how the branch can have an influence
>>>> on the the DSB (nor why the DSB has an influence on the rest of the
>>>> execution, btw).
>>>>
>>>> What if you replace the DSB with an ISB? Do you observe a similar
>>>> behaviour (works if the barrier is in __guest_enter, but not if it is
>>>> outside)?
>>> I have already tried with isb without success. I did another
>>> experiment flush stage-2 TLBs before calling __guest_enetr(),
>>> it fixed the problem.
>> I suspected something like that. But it is such a massive hammer that it
>> will hide any sort of subtle bug (HW *and* SW).
>>
>>>> Another thing worth looking at is what happened just before we decided
>>>> to get back into the guest. Or to put it differently, what was the
>>>> reason to exit the first place. Was it a Stage-2 fault by any chance?
>>> I will collect as much possible debug data and update results
>>> to you. I went through your KVM refracted 'C' code and did not
>>> find any thing suspicious. I am thinking may be Qualcomm CPUs
>>> have a very aggressive prefech logic that causing the problem.
>> OK. Please keep me posted about your findings. Also maybe involving some
>> HW people ouwld be a good idea (running something in an emulator, for
>> example...).

This has been confirmed to be a hardware defect with a firmware workaround.

Regards,
Christopher Covington

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux