Re: IWL errors when reading PCI config through /sys

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 05, 2024 at 01:24:59AM +0100, Jan Šídlo wrote:
> On Mon, 2024-11-04 at 17:33 -0600, Bjorn Helgaas wrote:
> > It *should* be safe to read "config" from sysfs at any time, and
> > also to write to the ASPM "policy" module parameter file at any
> > time, but there could be bugs there.
> > 
> > When you say "crash", I guess you mean all the iwlwifi error
> > logging and the WARN_ON() stacktraces, right?   I don't see an
> > actual oops or panic in the logs yet.
> 
> There is no crash in form of an oops from the kernel fortunately :)
> But the WLAN card stops talking & IWL driver is not able to recover.
> Only shutdown fixes the issue. I did not try just reboot to be
> honest as I thought that full powercycle is necessary to properly
> reset the device - but I can try tomorrow if necessary.
> 
> > I assume none of these happen unless you are running your script
> > or writing the "policy" parameter?  Does the problem happen if you
> > *only* run your script to scrape the info from "config"?  What
> > about if you *only* update the "policy" parameter?
> 
> The error does not happen if I read the config - I tested that
> properly. Without touching the ASPM policy the script is able to run
> without any problems. And also I can trigger the bug immediately
> when I write "powersave" to the ASPM policy without the script.

Perfect, thanks for narrowing that down!

> > Emmanuel is right; the iwlwifi logging (e.g., "iwlwifi
> > 0000:04:00.0: 0xFFFFFFFF | ADVANCED_SYSASSERT") sure looks like
> > reads from the device are failing so we get ~0 data.  I'm guessing
> > those come from a BAR, so the BAR could be disabled or the device
> > might not be responding e.g., if it is in a low-power state (D1,
> > D2, D3hot, D3cold) or being reset.
> 
> Device is reported being in D0 through the sysfs, but I'm not sure
> if that is really correct, because if I do echo 1 > remove and
> rescan, the kernel complains about not being able to talk to the
> device. I can get the exact error tomorrow if you'd like.

It's unavoidably racy to read the current state from config space.
But since you've identified the write to "policy" in
pcie_aspm_set_policy() as the critical item, I think that's the place
to look.

We had some recent issues related to configuring ASPM while the device
was in a low-power state, e.g.,
https://lore.kernel.org/linux-pci/20240130163519.GA521777@bhelgaas/

While pcie_config_aspm_link() does check dev->current_state, I don't
see anything that would prevent the power management framework from
changing the power state while we're configuring devices to match the
new ASPM state.

Bjorn




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux