On 2017-01-30 09:38, Will Deacon wrote:
On Mon, Jan 30, 2017 at 09:33:50AM -0500, Sinan Kaya wrote:
On 1/30/2017 9:23 AM, Nate Watterson wrote:
> On 2017-01-30 08:59, Sinan Kaya wrote:
>> On 1/30/2017 7:22 AM, Robin Murphy wrote:
>>> On 29/01/17 17:53, Sinan Kaya wrote:
>>>> On 1/24/2017 7:37 AM, Lorenzo Pieralisi wrote:
>>>>> [+hanjun, tomasz, sinan]
>>>>>
>>>>> It is quite a key patchset, I would be glad if they can test on their
>>>>> respective platforms with IORT.
>>>>>
>>>>
>>>> Tested on top of 4.10-rc5.
>>>>
>>>> 1. Platform Hidma device passed dmatest
>>>> 2. Seeing some USB stalls on a platform USB device.
>>>> 3. PCIe NVME drive probed and worked fine with MSI interrupts after boot.
>>>> 4. NVMe driver didn't probe following a hotplug insertion and received an
>>>> SMMU error event during the insertion.
>>>
>>> What was the SMMU error - a translation/permission fault (implying the
>>> wrong DMA ops) or a bad STE fault (implying we totally failed to tell
>>> the SMMU about the device at all)?
>>>
>>
>> root@ubuntu:/sys/bus/pci/slots/4# echo 0 > power
>>
>> [__204.698522]_iommu:_Removing_device_0003:01:00.0_from_group_0
>> [ 204.708704] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down
>> [ 204.708723] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down event
>> ignored; already powering off
>>
>> root@ubuntu:/sys/bus/pci/slots/4#
>>
>> [__254.820440]_iommu:_Adding_device_0003:01:00.0_to_group_8
>> [ 254.820599] nvme nvme0: pci function 0003:01:00.0
>> [ 254.820621] nvme 0003:01:00.0: enabling device (0000 -> 0002)
>> [ 261.948558] arm-smmu-v3 arm-smmu-v3.0.auto: event 0x0a received:
>> [ 261.948561] arm-smmu-v3 arm-smmu-v3.0.auto: 0x000001000000000a
>> [ 261.948563] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000
>> [ 261.948564] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000
>> [ 261.948566] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000
> Looks like C_BAD_CD. Can you please try with:
> iommu/arm-smmu-v3: Clear prior settings when updating STEs
This resolved the issue. Can we pull Nate's patch to 4.10 so that I
don't see
this issue again.
I already sent the pull request to Joerg for 4.11. Do you see this
problem
without Sricharan's patches (i.e. vanilla mainline)? If so, we'll need
to
send the patch to stable after -rc1.
Using vanilla mainline, I see it most commonly when directly assigning
a device to a guest machine. I think I've also seen it after removing
then
re-adding a PCI device. Basically anytime an STE's CTX pointer is
changed
from a non-NULL value and STE[CFG] indicates translation will be
performed.
Nate
Will
--
Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a
Linux
Foundation Collaborative Project.
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html