Ben Greear <greearb@xxxxxxxxxxxxxxx> writes: > On 10/13/2017 08:50 AM, Adrian Chadd wrote: >> On 13 October 2017 at 05:41, Kalle Valo <kvalo@xxxxxxxxxxxxxxxx> wrote: >>> greearb@xxxxxxxxxxxxxxx writes: >>> >>>> From: Ben Greear <greearb@xxxxxxxxxxxxxxx> >>>> >>>> This works around a problem we see when sometimes the wifi NIC does >>>> not respond the first time. This seems to happen especially often on >>>> some of the 9984 NICs in mid-range platforms. >>>> >>>> Signed-off-by: Ben Greear <greearb@xxxxxxxxxxxxxxx> >>> >>> [...] >>> >>>> -static int ath10k_pci_probe(struct pci_dev *pdev, >>>> - const struct pci_device_id *pci_dev) >>>> +static int __ath10k_pci_probe(struct pci_dev *pdev, >>>> + const struct pci_device_id *pci_dev) >>>> { >>>> int ret = 0; >>>> struct ath10k *ar; >>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev, >>>> return ret; >>>> } >>>> >>>> +static int ath10k_pci_probe(struct pci_dev *pdev, >>>> + const struct pci_device_id *pci_dev) >>>> +{ >>>> + int cnt = 0; >>>> + int rv; >>>> + do { >>>> + rv = __ath10k_pci_probe(pdev, pci_dev); >>>> + if (rv == 0) >>>> + return rv; >>>> + pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt); >>>> + mdelay(10); /* let the ath10k firmware gerbil take a small break */ >>>> + } while (cnt++ < 10); >>>> + return rv; >>>> +} >>> >>> This is a sledgehammer approach and it causes reload for all error >>> cases, like when hardware is broken or memory allocation is failing. >>> >>> When the problem happens does it always fail at the the same place? Is >>> it hw reset or something else? It's better to retry the invidiual action >>> than to do this hack. Or is it just some more delay needed somewhere? >> >> I am seeing WMI timeouts during initial firmware load and wait on >> QCA9984 + BCM7444S SoC. >> My guess is the WMI wakeup time is not "right" enough and needs to be >> extended a little bit. >> >> But then, I have played a lot of whackamole with WMI timeouts during >> my loooong porting effort.. > > The failure I saw was a failure to wake pci, and from comments, it seems that > the current wait is longer than what should be required, and it warns on slow > wakes, and I never saw that warning. So I assume that waiting longer would not help. > > I saw it fail twice in a row to wake pci and then succeed on the third > try, for instance, > when testing my patch. > > As for a big hammer, I guess we could check for certain return codes if you think > that is better than just retrying all failures? ath10k_pci_probe() has a lots of stuff which should not affect your problem, like allocating memory, setting up timers and interrupts etc. It's quite ugly to redo that in every cycle. A more fine grained solution, like looping specific action (reset, wake whatever) is much more preferred. Do you have debug logs of failing cases? -- Kalle Valo