On 2020-08-21 04:59, Ben Greear wrote:
On 8/20/20 1:15 PM, Krishna Chaitanya wrote:
On Thu, Aug 20, 2020 at 11:23 PM Ben Greear <greearb@xxxxxxxxxxxxxxx>
wrote:
On 8/20/20 10:42 AM, Krishna Chaitanya wrote:
On Thu, Aug 20, 2020 at 11:11 PM Krishna Chaitanya
<chaitanya.mgit@xxxxxxxxx> wrote:
On Thu, Aug 20, 2020 at 10:38 PM Ben Greear
<greearb@xxxxxxxxxxxxxxx> wrote:
On 8/20/20 10:00 AM, Krishna Chaitanya wrote:
On Thu, Aug 20, 2020 at 10:02 PM Ben Greear
<greearb@xxxxxxxxxxxxxxx> wrote:
On 8/20/20 9:08 AM, Krishna Chaitanya wrote:
On Thu, Aug 20, 2020 at 8:07 PM Wen Gong <wgong@xxxxxxxxxxxxxx>
wrote:
On 2020-08-20 18:52, Krishna Chaitanya wrote:
On Thu, Aug 20, 2020 at 3:45 PM Wen Gong
<wgong@xxxxxxxxxxxxxx> wrote:
On 2020-08-20 17:19, Krishna Chaitanya wrote:
...
I'm not really convinced that this is the right fix, but
I'm no NAPI
expert. Can anyone else help?
Calling napi_disable() twice can lead to hangs, but moving
NAPI from
start/stop to
the probe isn't the right approach as the datapath is tied
to
start/stop.
Maybe check the state of NAPI before disable?
if (test_bit(NAPI_STATE_SCHED, &ar->napi.napi.state))
napi_disable(&ar->napi)
or maintain napi_state like this
https://patchwork.kernel.org/patch/10249365/
it is better to use above link's patch.
napi.state is controlled by napi API, it is better ath10k
not know it.
Sure, but IMHO just canceling the async rx work should solve
the issue.
Oh no, canceling the async rx work will not solve this issue,
rx worker
ath10k_rx_indication_async_work call napi_schedule, after
napi_complete,
the NAPI_STATE_SCHED will clear.
The issue of this patch is because 2 thread called to hif_stop
and
NAPI_STATE_SCHED not clear.
That fix is still valid and good to have.
ndev_stop being called twice is typical scenarios (stop vs
rmmod), so
just checking the netdev_flags for IFF_UP and returning
from hif_Stop
should suffice, no?
My approach to fix this problem was to add a boolean in ath10k
as to whether
it had napi enabled or not, and then check that before trying to
enable/disable
it again. Seems to work fine, and cleaner in my mind than
checking internal
napi flags.
A much simpler approach is just to check for IFF_UP and skip NAPI
(and others)
in the hif_stop no? (provided proper RTNL locking is done if
hif_stop
is being called
internally as well).
I'm not sure, but I think the driver should be internally
consistent and not
spend a lot of time trying to guess about interactions with
objects higher
in the stack.
Fair enough, the network interface state is a basic thing
controlled
by the driver,
so, should be okay to use. Anyways, the in-driver approach has more
control.
Here is my original patch to fix this, it is not complex.
https://patchwork.kernel.org/patch/10249363/
Sure, I have shared your patch above :).
Sent a bit early, any idea why this wasn't upstreamed earlier?
No, one comment from Michal indicated maybe there were more problems
lurking
in this area, but he seemed to be OK with the patch over all. After
that,
it was just ignored.
Now might be a good time to push for it :)
It is generally a waste of time in my experience. Kalle is the
maintainer and should
be seeing any of this he cares to see. If he likes the patch, he can
apply it or
something similar. If you have a reproducible test case, see if the
patch fixes
things, that might help it be accepted.
I have 2 cmd, each one can reproduce the hang.
echo soft >
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep
0.05;ifconfig wlan0 down
echo soft >
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod
ath10k_sdio
and with the my patch, it fix the hang. Change of my patch is similar
with your
patch(https://patchwork.kernel.org/patch/10249365/), so it should also
fix the hang with your patch.
Thanks,
Ben