RE: [PATCH] PCI/AER: save/restore AER registers during suspend/resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Bjorn and Keith for the response.
> 
> On Wed, Jul 24, 2019 at 01:45:48PM -0500, Bjorn Helgaas wrote:
> > On Wed, Jul 10, 2019 at 10:36:16AM -0700, sathyanarayanan kuppuswamy wrote:
> > > On 7/10/19 10:22 AM, Patel, Mayurkumar wrote:
> > > > > On 7/9/19 1:00 AM, Patel, Mayurkumar wrote:
> > > > > > After system suspend/resume cycle AER registers settings are
> > > > > > lost. Not restoring Root Error Command Register bits if it were
> > > > > > set, keeps AER interrupts disabled after system resume.
> > > > > > Moreover, AER mask and severity registers are also required
> > > > > > to be restored back to AER settings prior to system suspend.
> > > > > >
> > > > > > Signed-off-by: Mayurkumar Patel <mayurkumar.patel@xxxxxxxxx>
> > > > > > ---
> > > > > >    drivers/pci/pci.c      |  2 ++
> > > > > >    drivers/pci/pcie/aer.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >    include/linux/aer.h    |  4 ++++
> > > > > >    3 files changed, 55 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > > > index 8abc843..40d5507 100644
> > > > > > --- a/drivers/pci/pci.c
> > > > > > +++ b/drivers/pci/pci.c
> > > > > > @@ -1340,6 +1340,7 @@ int pci_save_state(struct pci_dev *dev)
> > > > > >
> > > > > >    	pci_save_ltr_state(dev);
> > > > > >    	pci_save_dpc_state(dev);
> > > > > > +	pci_save_aer_state(dev);
> > > > > >    	return pci_save_vc_state(dev);
> > > > > >    }
> > > > > >    EXPORT_SYMBOL(pci_save_state);
> > > > > > @@ -1453,6 +1454,7 @@ void pci_restore_state(struct pci_dev *dev)
> > > > > >    	pci_restore_dpc_state(dev);
> > > > > >
> > > > > >    	pci_cleanup_aer_error_status_regs(dev);
> > > > > > +	pci_restore_aer_state(dev);
> > > > > >
> > > > > >    	pci_restore_config_space(dev);
> > > > > >
> > > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > > > > index b45bc47..1acc641 100644
> > > > > > --- a/drivers/pci/pcie/aer.c
> > > > > > +++ b/drivers/pci/pcie/aer.c
> > > > > > @@ -448,6 +448,54 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
> > > > > >    	return 0;
> > > > > >    }
> > > > > >
> > > > > > +void pci_save_aer_state(struct pci_dev *dev)
> > > > > > +{
> > > > > > +	int pos = 0;
> > > > > > +	struct pci_cap_saved_state *save_state;
> > > > > > +	u32 *cap;
> > > > > > +
> > > > > > +	if (!pci_is_pcie(dev))
> > > > > > +		return;
> > > > > > +
> > > > > > +	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_ERR);
> > > > > > +	if (!save_state)
> > > > > > +		return;
> > > > > > +
> > > > > > +	pos = dev->aer_cap;
> > > > > > +	if (!pos)
> > > > > > +		return;
> > > > > > +
> > > > > > +	cap = &save_state->cap.data[0];
> > > > > > +	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, cap++);
> > > > > > +	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, cap++);
> > > > > > +	pci_read_config_dword(dev, pos + PCI_ERR_COR_MASK, cap++);
> > > > > > +	pci_read_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, cap++);
> >
> > > > > I don't see AER driver modifying UNCOR_MASK/SEVER/COR_MASK register. If
> > > > > all it has is default value then why do you want to preserve it ?
> > > > >
> > > > Thanks for reply.
> > > > You are right about UNCOR_MASK/SEVER/COR_MASK mask AER driver
> > > > leaves untouched, But IMHO users for example can use "setpci" to
> > > > unmask specific errors and its severity based on their debugging
> > > > requirement on Root port and/or Endpoint. So during resume, if
> > > > PCIe endpoint fails due to any error which by default stays masked
> > > > then it can't be catched and/or debugged.  Moreover, Endpoint
> > > > driver may also unmask during "driver probe" certain specific
> > > > errors for endpoint, which needs to be restored while resume.
> > >
> > > Isn't these registers configuration usually done by firmware ? I
> > > think user application rarely touch them. Also, IMO,
> > > Caching/Restoring registers under the assumption that it might be
> > > useful for user if they modified it is not a convincing argument.
> > > But I will let Bjorn and others decide whether its alright to cache
> > > these registers.
> >
> > I think the ideal user experience would be "suspend/resume has no
> > effect on any configuration".  That would argue for saving/restoring
> > registers even if we think it's unlikely the user would change them.
> 
> The call to pci_save_state most likely occurs long before a user has an
> opportunity to alter these regsiters, though. Won't this just restore
> what was previously there, and not the state you changed it to?


There were two things (not sure to call them issues),
1. PCI_ERR_ROOT_COMMAND resets to 0  during S3 entry/exit, which disables AER interrupt trigger
if AER happens on Endpoint after resume.

Also specified in spec. NCB-PCI_Express_Base_4.0r1.0_September-27-2017-c.pdf in
chapter 7.8.4.9 Root Error Command Register (Offset 2Ch) - in bitfields descriptions.
i.e. Correctable Error Reporting Enable – When Set, this bit enables the generation of an interrupt when
a correctable error is reported by any of the Functions in the Hierarchy Domain associated with this Root Port.

2. Root port resets to its default configuration of UNCOR_MASK/SEVER/COR_MASK register bits after system resume.
This influences user configurations, how errors shall be treated if AER happens on root port itself due to Device (for example
Endpoint not answering which results in completion timeouts on root ports).

Following is one example scenario which can handled with this patch.
- user configures AER registers using setpci certain masks and severity based on debug requirements. This can be applied on Root port of EP.
- triggers system test which includes S3 entry/exit cycles.
- system enters s3 -> AER registers settings are saved which has been configured by users.
- system exits s3 -> AER registers settings are restored which has been configured by users.


> 
> > > > @Bjorn/Anybody else has any opinion to cache/restore
> > > > UNCOR_MASK/SEVER/COR_MASK registers?  Please help to comment.
> > > >
> > > > > Also, any reason for not preserving ECRC settings ?
> > > > No specific reason. I can incorporte that with v2 of this patchset
> > > > but I don’t have HW on which I can validate that.
> >
> > I think we should preserve ECRC settings as well for the same reason.
> >
> > > > > > +}
> > > > > > +
> > > > > > +void pci_restore_aer_state(struct pci_dev *dev)
> > > > > > +{
> > > > > > +	int pos = 0;
> > > > > > +	struct pci_cap_saved_state *save_state;
> > > > > > +	u32 *cap;
> > > > > > +
> > > > > > +	if (!pci_is_pcie(dev))
> > > > > > +		return;
> > > > > > +
> > > > > > +	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_ERR);
> > > > > > +	if (!save_state)
> > > > > > +		return;
> > > > > > +
> > > > > > +	pos = dev->aer_cap;
> > > > > > +	if (!pos)
> > > > > > +		return;
> > > > > > +
> > > > > > +	cap = &save_state->cap.data[0];
> > > > > > +	pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, *cap++);
> > > > > > +	pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, *cap++);
> > > > > > +	pci_write_config_dword(dev, pos + PCI_ERR_COR_MASK, *cap++);
> > > > > > +	pci_write_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, *cap++);
> > > > > > +}
> > > > > > +
> > > > > >    void pci_aer_init(struct pci_dev *dev)
> > > > > >    {
> > > > > >    	dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
> > > > > > @@ -1396,6 +1444,7 @@ static int aer_probe(struct pcie_device *dev)
> > > > > >    		return status;
> > > > > >    	}
> > > > > >
> > > > > > +	pci_add_ext_cap_save_buffer(port, PCI_EXT_CAP_ID_ERR, sizeof(u32) * 4);
> > > > > >    	aer_enable_rootport(rpc);
> > > > > >    	pci_info(port, "enabled with IRQ %d\n", dev->irq);
> > > > > >    	return 0;
> 
> You are allocating the capability save buffer in aer_probe(), so this
> save/restore applies only to root ports. But you mention above that you
> want to restore end devices, right?

That’s correct. I agree that my commit message was not so explicit.
But Since I included PCI_ERR_ROOT_COMMAND register for save/restore which is specific to Root ports only & I thought
endpoint drivers can handle to save/restore (UNCOR_MASK/SEVER/COR_MASK) themselves with its suspend/resume functions.

However I am fine to move pci_add_ext_cap_save_buffer() to some other function so
that it also save/restores UNCOR_MASK/SEVER/COR_MASK registers for endpoints and
if it's not useful to save/restore MASK & SEVER registers  then also fine to remove them
and just restore PCI_ERR_ROOT_COMMAND & ECRC settings.  Please let me know.





















Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux