Hi Sinan
在 2019/1/26 1:09, Sinan Kaya 写道:
On 1/25/2019 9:28 AM, Dongdong Liu wrote:
I want to fix 2 points by the patch.
1. For EP devices (such as multi-function EP device) under the same bus,
when one of the EP devices met non-fatal error, should report non-fatal
error only to the error endpoint device, no need to broadcast all of them.
That is the patch (PCI/AER: Report non-fatal errors only to the affected endpoint #4.15)
have done, but current code PATCH [1] broken this.
2. We found a NULL pointer dereference issue for 74:02.0 device
when the device met a non-fatal error (firmware-first) after 4.19.
The topology is as below.
Is it possible to split the patches into two to address these two different
issues?
Good suggestion, will do.
I can understand the first one but second one seems some mystery that needs
to be explored in detail.
+-[0000:74]-+-02.0 Huawei Technologies Co., Ltd. HiSilicon SAS 3.0 HBA
| \-03.0 Huawei Technologies Co., Ltd. HiSilicon AHCI HBA
74:02.0 is a RCiEP, but do not under a root port.
Current code have the issue as the below code, see [DD].
aer_recover_work_func()
pcie_do_recovery() {
/*
* Error recovery runs on all subordinates of the first downstream port.
* If the downstream port detected the error, it is cleared at the end.
*/
if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
dev = dev->bus->self; //[DD]: Here dev will be NULL pointer for 74:02.0(RCiEP)as it does not have a root port.
bus = dev->subordinate;
}
We do not have the NULL pointer dereference issue
before the patch [1] PCI/ERR: Run error recovery callbacks for all affected devices #4.20 .
Thanks,
Dongdong
.