On 3/1/2018 7:03 PM, Bjorn Helgaas wrote: >> 3. The last one is adapter gets into fuzzy state due to not coming >> out of clean state in the second time init and being rejected by >> SMMUv3 multiple times. >> >> [ 16.093441] pci 0000:01:00.0: aer_status: 0x00040000, aer_mask: 0x00000000 >> [ 16.099356] pci 0000:01:00.0: Malformed TLP >> [ 16.103522] pci 0000:01:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID >> [ 16.110900] pci 0000:01:00.0: aer_uncor_severity: 0x00062011 >> [ 16.116543] pci 0000:01:00.0: TLP Header: 0a00a000 00008100 01010100 00000000 > I'm not clear on this. I don't remember what an IOMMU fault looks > like to an Endpoint. Are you saying that if an Endpoint sees too many > of those faults, it gets into this "fuzzy state" (whatever that is :))? > Is this a hardware defect? Do we care (this is a kdump kernel, after > all)? If we do care, can we fix the device by resetting it? fuzzy=funky=funny=wierd Regardless of what we do in the IOMMU driver, I think we still have to reset the endpoint in order to have a clean initialization. I'm not sure if all endpoint drivers can recover an adapter from a live state. I wasn't expecting to see a Malformed TLP error. I was guessing that this was caused by SMMU giving a CA or UR to the endpoint or having a live adapter in the middle of driver initialization. I think we do care about the adapter coming up properly otherwise how would you collect the dumps from the system? I was expecting to come through the network interface and download it from the target. That's why, I was suggesting FLR/PM reset etc. when we know that we are booting a kdump kernel. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.