Raj, On Sun, Oct 4, 2020 at 12:57 PM Raj, Ashok <ashok.raj@xxxxxxxxxxxxxxx> wrote: > > Hi Ethan > > On Sat, Oct 03, 2020 at 03:55:09AM -0400, Ethan Zhao wrote: > > Hi,folks, > > > > This simple patch set fixed some serious security issues found when DPC > > error injection and NVMe SSD hotplug brute force test were doing -- race > > condition between DPC handler and pciehp, AER interrupt handlers, caused > > system hang and system with DPC feature couldn't recover to normal > > working state as expected (NVMe instance lost, mount operation hang, > > race PCIe access caused uncorrectable errors reported alternatively etc). > > I think maybe picking from other commit messages to make this description in > cover letter bit clear. The fundamental premise is that when due to error > conditions when events are processed by both DPC handler and hotplug handling of > DLLSC both operating on the same device object ends up with crashes. Yep, that's right. Thanks, Ethan > > > > > > With this patch set applied, stable 5.9-rc6 on ICS (Ice Lake SP platform, > > see > > https://en.wikichip.org/wiki/intel/microarchitectures/ice_lake_(server)) > > > > could pass the PCIe Gen4 NVMe SSD brute force hotplug test with any time > > interval between hot-remove and plug-in operation tens of times without > > any errors occur and system works normal. > > > > > With this patch set applied, system with DPC feature could recover from > > NON-FATAL and FATAL errors injection test and works as expected. > > > > System works smoothly when errors happen while hotplug is doing, no > > uncorrectable errors found. > > > > Brute DPC error injection script: > > > > for i in {0..100} > > do > > setpci -s 64:02.0 0x196.w=000a > > setpci -s 65:00.0 0x04.w=0544 > > mount /dev/nvme0n1p1 /root/nvme > > sleep 1 > > done > > > > Other details see every commits description part. > > > > This patch set could be applied to stable 5.9-rc6/rc7 directly. > > > > Help to review and test. > > > > v2: changed according to review by Andy Shevchenko. > > v3: changed patch 4/5 to simpler coding. > > v4: move function pci_wait_port_outdpc() to DPC driver and its > > declaration to pci.h. (tip from Christoph Hellwig <hch@xxxxxxxxxxxxx>). > > v5: fix building issue reported by lkp@xxxxxxxxx with some config. > > v6: move patch[3/5] as the first patch according to Lukas's suggestion. > > and rewrite the comment part of patch[3/5]. > > v7: change the patch[4/5], based on Bjorn's code and truth table. > > change the patch[5/5] about the debug output information. > > > > Thanks, > > Ethan > > > > > > Ethan Zhao (5): > > PCI/ERR: get device before call device driver to avoid NULL pointer > > dereference > > PCI/DPC: define a function to check and wait till port finish DPC > > handling > > PCI: pciehp: check and wait port status out of DPC before handling > > DLLSC and PDC > > PCI: only return true when dev io state is really changed > > PCI/ERR: don't mix io state not changed and no driver together > > > > drivers/pci/hotplug/pciehp_hpc.c | 4 ++- > > drivers/pci/pci.h | 55 +++++++++++++------------------- > > drivers/pci/pcie/dpc.c | 27 ++++++++++++++++ > > drivers/pci/pcie/err.c | 18 +++++++++-- > > 4 files changed, 68 insertions(+), 36 deletions(-) > > > > > > base-commit: a1b8638ba1320e6684aa98233c15255eb803fac7 > > -- > > 2.18.4 > >