Re: [PATCH v3 0/7] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018-07-19 01:14, Bjorn Helgaas wrote:
This is a v3 of Oza's patches [1].  It's available at [2] if you prefer
git.

v3 changes:
- Add pci_aer_clear_fatal_status() to clear ERR_FATAL bits, only called
    from pcie_do_fatal_recovery().  Moved to first in series to avoid a
window where ERR_FATAL recovery only clears ERR_NONFATAL bits. Visible
    only inside the PCI core.
- Instead of having pci_cleanup_aer_uncorrect_error_status() do different things based on dev->error_state, use this only for ERR_NONFATAL bits.
    I didn't change the name because it's used by many drivers.
  - Rename pci_cleanup_aer_error_device_status() to
pci_aer_clear_device_status(), make it void, and make it visible only
    inside the PCI core.
- Remove pcie_portdrv_err_handler.slot_reset altogether instead of making it a stub function. Possibly pcie_portdrv_err_handler could be removed
    completely?

[1]
https://lkml.kernel.org/r/1529661494-20936-1-git-send-email-poza@xxxxxxxxxxxxxx
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/?h=pci/06-22-oza-aer

---

Bjorn Helgaas (1):
      PCI/AER: Clear only ERR_FATAL status bits during fatal recovery

Oza Pawandeep (6):
      PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
      PCI/AER: Factor out ERR_NONFATAL status bit clearing
      PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL
      PCI/AER: Clear device status bits during ERR_COR handling
      PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset


 drivers/pci/pci.h              |    5 ++++
drivers/pci/pcie/aer.c | 47 +++++++++++++++++++++++++++-------------
 drivers/pci/pcie/err.c         |   15 +++++--------
 drivers/pci/pcie/portdrv_pci.c |   25 ---------------------
 4 files changed, 43 insertions(+), 49 deletions(-)


Hi Bjorn,

I am planning on some things to do after this series.


your text
"
1) I don't think the driver slot_reset callbacks should be responsible
for clearing these AER status bits.  Can we clear them somewhere in
the pcie_do_nonfatal_recovery() path and remove these calls from the
drivers?
"

Oza: We can do following
broadcast_error_message()
      if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
                should do
pci_walk_bus(dev->subordinate, pci_cleanup_aer_uncorrect_error_status, NULL);

and update all the drivers and remove the call pci_cleanup_aer_uncorrect_error_status()


2) In principle, we should only read PCI_ERR_UNCOR_STATUS *once* per
device when handling an error.  We currently read it three times:

  aer_isr
    aer_isr_one_error
      find_source_device
        find_device_iter
          is_error_source
            read PCI_ERR_UNCOR_STATUS              # 1
Oza: this is the first legitimate read
      aer_process_err_devices
        get_device_error_info(e_info->dev[i])
          read PCI_ERR_UNCOR_STATUS                # 2
Oza: I see this read used to check if link is healthy so the purpose of this read looks different to me.
        handle_error_source
          pcie_do_nonfatal_recovery
            ...
              report_slot_reset
                driver->err_handler->slot_reset
                  pci_cleanup_aer_uncorrect_error_status
                    read PCI_ERR_UNCOR_STATUS      # 3
Oza: pci_cleanup_aer_uncorrect_error_status() is generic and able to clear status.
for e.g. in point 4 as I suggested if we have to do
pci_walk_bus(dev->subordinate, pci_cleanup_aer_uncorrect_error_status, NULL); then we have to read them.


3) we need to get rid of pci_channel_io_frozen permanently.

Regards,
Oza.



















[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux