Re: [PATCH] PCI: Disable async suspend/resume for Jmicron chip

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Bjorn,

On Fri, 2015-08-14 at 15:17 -0500, Bjorn Helgaas wrote:
> Hi Zhang,
> 
> On Wed, Jul 29, 2015 at 09:39:11AM +0800, Zhang Rui wrote:
> > From c6b063b2874b62466362a10b53097216b16e400d Mon Sep 17 00:00:00 2001
> > From: Zhang Rui <rui.zhang@xxxxxxxxx>
> > Date: Sun, 26 Jul 2015 14:15:36 +0800
> > Subject: [PATCH] PCI: Disable async suspend/resume for Jmicron chip
> > 
> > In https://bugzilla.kernel.org/show_bug.cgi?id=81551,
> > we found that Jmicron chip 361/363 is broken after resume if async noirq
> > (76569faa62 (PM / sleep: Asynchronous threads for resume_noirq)) is supported,
> > thus commit e6b7e41cdd (ata: Disabling the async PM for JMicron chip 363/361)
> > is introduced to fix this problem.
> > But then, we found that Jmicron chip 368 also has this problem, and it is decided
> > to disable the pm async feature for all the Jmicron chips.
> > 
> > But how to fix this was discussed in the mailing list for some time.
> > After some investigation, we believe that a proper fix is to disable
> > the async PM in PCI instead of ata driver, because, on this platform,
> > pci_resume_noirq() of IDE controller must happen after pci_resume_noirq()
> > of AHCI controller. But as .resume_noirq() of the pata_jmicron driver is
> > no-op, this suggests that it is the PCI common actions, aka,
> > pci_pm_default_resume_early(), have the dependency.
> > To fix this, using device_pm_wait_for_dev() in pata_jmicron driver can not
> > solve the dependency because pci_pm_default_resume_early() is invoked before
> > driver callback being invoked, plus, as it is the PCI common actions that
> > have the dependency, it is reasonable to fix it in PCI bus code,
> > rather than driver code.
> > 
> > This patch is made based on the patch from Liu Chuansheng at
> > https://lkml.org/lkml/2014/12/5/74
> > it reverts commit e6b7e41cdd ("ata: Disabling the async PM for JMicron
> > chip 363/361"), and introduces a PCI quirk to disable async PM for Jmicron
> > chips.
> 
> This changelog has a lot of text, but doesn't tell me what I really
> want to know.
> 
> We need to know what the problem is from the *device* point of view,
> not in terms of kernel function names.  Function names are only
> meaningful to a handful of experts, but a concrete problem description
> may be useful to hardware designers and firmware writers who can
> potentially fix the root issue.
> 
Well, I sent this patch just because there is a regression since 3.15,
and we already have two patches that have been verified to fix the
problem, but none of them are pushed for upstream. I don't have many PCI
background, thus I chose to use function names to make myself precise
enough but apparently I failed :p
Maybe I should send this patch as a RFC patch to get a perfect fix.

> In this case, I think the problem is something like: "on these
> multi-function JMicron devices, the PATA controller at function 1
> doesn't work if it is powered on before the SATA controller at
> function 0."
> 
exactly.

> It's also helpful to include a symptom to help people connect a
> problem with the solution.  For example, the 81551 bug reports says
> these are symptoms:
> 
>   pata_jmicron 0000:02:00.1: Refused to change power state, currently in D3
>   irq 17: nobody cared (try booting with the "irqpoll" option)
> 
okay.

> This patch doesn't change the workaround: we still use
> device_disable_async_suspend() on these devices.  This patch merely:
> 
>   1) Changes the workaround so instead of doing this only for 361 and
>   363 devices, we turn off async suspend on *all* JMicron devices, and
> 
>   2) Moves the workaround from the AHCI and PATA drivers to the PCI
>   core.
> 
> For 1), I think it is probably overkill to penalize all JMicron
> devices.  There's no reason to think NICs and SD/MMC/etc controllers
> have the same issue.  Or if there *is* reason to think that, you
> should give evidence for it.

No, I don't have any evidence. It's just because the previous fix
becomes insufficient, which makes people wondering if we should disable
all of them.

> 
> For 2), you have not made a convincing argument for why the workaround
> needs to be in the PCI core.  Such an argument would be of the form
> "we need this quirk even if the driver hasn't been loaded yet
> because ..."

Good point, Jay and Barto, can you please verify this?

> Since you didn't say anything like that, I assume the patch in comment
> #72 of bug #81551 (https://bugzilla.kernel.org/attachment.cgi?id=156301)
> would work as well.  That has the advantage that it wouldn't penalize
> non-storage controllers.
> 
Yes, the fix in driver code also works. But let's wait for the test
results from Jay and Barto because this problem really sounds like a
dependency in PCI code.

> Other minor nits:
> 
> - The function "pci_resume_noirq()" does not exist.  I assume you meant
>   pci_pm_resume_noirq().  And as I mentioned earlier, I don't think the
>   name is relevant in the changelog anyway.
> 
> - You have a mix of SHA1 lengths and summary quoting above.  Please use
>   the conventional commit reference style (with 12-char SHA-1)
>   consistently, e.g.,
> 
>     76569faa62c4 ("PM / sleep: Asynchronous threads for resume_noirq")
> 
> - Please use http://lkml.kernel.org/r/... references when possible,
>   not https://lkml.org/....
> 
Thanks for pointing these out. I was not aware of such mistakes before.
Thank you.

-rui
> > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=81511
> > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=81551
> > Reviewed-by: Aaron Lu <aaron.lu@xxxxxxxxx>
> > Acked-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx>
> > Acked-by: Tejun Heo <tj@xxxxxxxxxx>
> > Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx>
> > ---
> >  drivers/ata/ahci.c         | 12 ------------
> >  drivers/ata/pata_jmicron.c | 12 ------------
> >  drivers/pci/quirks.c       | 17 +++++++++++++++++
> >  3 files changed, 17 insertions(+), 24 deletions(-)
> > 
> > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> > index 7e62751..26bb40d 100644
> > --- a/drivers/ata/ahci.c
> > +++ b/drivers/ata/ahci.c
> > @@ -1451,18 +1451,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> >  	else if (pdev->vendor == 0x177d && pdev->device == 0xa01c)
> >  		ahci_pci_bar = AHCI_PCI_BAR_CAVIUM;
> >  
> > -	/*
> > -	 * The JMicron chip 361/363 contains one SATA controller and one
> > -	 * PATA controller,for powering on these both controllers, we must
> > -	 * follow the sequence one by one, otherwise one of them can not be
> > -	 * powered on successfully, so here we disable the async suspend
> > -	 * method for these chips.
> > -	 */
> > -	if (pdev->vendor == PCI_VENDOR_ID_JMICRON &&
> > -		(pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 ||
> > -		pdev->device == PCI_DEVICE_ID_JMICRON_JMB361))
> > -		device_disable_async_suspend(&pdev->dev);
> > -
> >  	/* acquire resources */
> >  	rc = pcim_enable_device(pdev);
> >  	if (rc)
> > diff --git a/drivers/ata/pata_jmicron.c b/drivers/ata/pata_jmicron.c
> > index 47e418b..4d1a5d2 100644
> > --- a/drivers/ata/pata_jmicron.c
> > +++ b/drivers/ata/pata_jmicron.c
> > @@ -143,18 +143,6 @@ static int jmicron_init_one (struct pci_dev *pdev, const struct pci_device_id *i
> >  	};
> >  	const struct ata_port_info *ppi[] = { &info, NULL };
> >  
> > -	/*
> > -	 * The JMicron chip 361/363 contains one SATA controller and one
> > -	 * PATA controller,for powering on these both controllers, we must
> > -	 * follow the sequence one by one, otherwise one of them can not be
> > -	 * powered on successfully, so here we disable the async suspend
> > -	 * method for these chips.
> > -	 */
> > -	if (pdev->vendor == PCI_VENDOR_ID_JMICRON &&
> > -		(pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 ||
> > -		pdev->device == PCI_DEVICE_ID_JMICRON_JMB361))
> > -		device_disable_async_suspend(&pdev->dev);
> > -
> >  	return ata_pci_bmdma_init_one(pdev, ppi, &jmicron_sht, NULL, 0);
> >  }
> >  
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index e9fd0e9..02803f8 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -29,6 +29,23 @@
> >  #include "pci.h"
> >  
> >  /*
> > + * For JMicron chips, we need to disable the async_suspend method, otherwise
> > + * they will hit the power-on issue when doing device resume, add one quick
> > + * solution to disable the async_suspend method.
> > + *
> > + * https://bugzilla.kernel.org/show_bug.cgi?id=81551
> > + */
> > +static void pci_async_suspend_fixup(struct pci_dev *pdev)
> > +{
> > +	/*
> > +	 * disabling the async_suspend method for JMicron chips to
> > +	 * avoid device resuming issue.
> > +	 */
> > +	device_disable_async_suspend(&pdev->dev);
> > +}
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, pci_async_suspend_fixup);
> > +
> > +/*
> >   * Decoding should be disabled for a PCI device during BAR sizing to avoid
> >   * conflict. But doing so may cause problems on host bridge and perhaps other
> >   * key system devices. For devices that need to have mmio decoding always-on,
> > -- 
> > 1.9.1
> > 
> > 
> > 


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux