Search Linux Wireless

Re: [PATCH v2] ath10k: Retry pci probe on failure.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ben Greear <greearb@xxxxxxxxxxxxxxx> writes:

> On 10/13/2017 08:50 AM, Adrian Chadd wrote:
>> On 13 October 2017 at 05:41, Kalle Valo <kvalo@xxxxxxxxxxxxxxxx> wrote:
>>> greearb@xxxxxxxxxxxxxxx writes:
>>>
>>>> From: Ben Greear <greearb@xxxxxxxxxxxxxxx>
>>>>
>>>> This works around a problem we see when sometimes the wifi NIC does
>>>> not respond the first time.  This seems to happen especially often on
>>>> some of the 9984 NICs in mid-range platforms.
>>>>
>>>> Signed-off-by: Ben Greear <greearb@xxxxxxxxxxxxxxx>
>>>
>>> [...]
>>>
>>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> -                         const struct pci_device_id *pci_dev)
>>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>>> +                           const struct pci_device_id *pci_dev)
>>>>   {
>>>>        int ret = 0;
>>>>        struct ath10k *ar;
>>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>>>        return ret;
>>>>   }
>>>>
>>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> +                         const struct pci_device_id *pci_dev)
>>>> +{
>>>> +     int cnt = 0;
>>>> +     int rv;
>>>> +     do {
>>>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>>>> +             if (rv == 0)
>>>> +                     return rv;
>>>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>>>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>>>> +     } while (cnt++ < 10);
>>>> +     return rv;
>>>> +}
>>>
>>> This is a sledgehammer approach and it causes reload for all error
>>> cases, like when hardware is broken or memory allocation is failing.
>>>
>>> When the problem happens does it always fail at the the same place? Is
>>> it hw reset or something else? It's better to retry the invidiual action
>>> than to do this hack. Or is it just some more delay needed somewhere?
>>
>> I am seeing WMI timeouts during initial firmware load and wait on
>> QCA9984 + BCM7444S SoC.
>> My guess is the WMI wakeup time is not "right" enough and needs to be
>> extended a little bit.
>>
>> But then, I have played a lot of whackamole with WMI timeouts during
>> my loooong porting effort..
>
> The failure I saw was a failure to wake pci, and from comments, it seems that
> the current wait is longer than what should be required, and it warns on slow
> wakes, and I never saw that warning.  So I assume that waiting longer would not help.
>
> I saw it fail twice in a row to wake pci and then succeed on the third
> try, for instance,
> when testing my patch.
>
> As for a big hammer, I guess we could check for certain return codes if you think
> that is better than just retrying all failures?

ath10k_pci_probe() has a lots of stuff which should not affect your
problem, like allocating memory, setting up timers and interrupts etc.
It's quite ugly to redo that in every cycle. A more fine grained
solution, like looping specific action (reset, wake whatever) is much
more preferred.

Do you have debug logs of failing cases?

-- 
Kalle Valo



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux