On 06/10/2018 10:10 AM, Michał Kazior wrote:
Ben,
The patch is symptomatic. fq_tin_dequeue() already checks if the list
is empty before it tries to access first entry. I see no point in
using the _or_null() + WARN_ON.
The 0x3c deref is likely an offset off of NULL base pointer. Did you
check gdb/addr2line of the ieee80211_tx_dequeue+0xfb? Where did it
point to?
gdb pointed to one line above the flow dereference, which is why I was
going to put some debugging in there.
I suspect there's not enough synchronization between quescing the
device/ath10k after fw crashes and performing mac80211's reconfig
procedure.
I am already running this patch which helps with some of that. That
patch never made it upstream, but it fixed problems for me earlier.
https://patchwork.kernel.org/patch/9457639/
Could easily be there are some more issues in that logic.
Someone else posted a patch to disable mac-80211 tx when FW crashes,
I think...I have not tried to backport that.
https://patchwork.kernel.org/patch/10411967/
Thanks,
Ben
Michał
On 8 June 2018 at 23:40, Arend van Spriel <arend.vanspriel@xxxxxxxxxxxx> wrote:
On 6/8/2018 5:17 PM, Ben Greear wrote:
I recalled an email from Michał leaving tieto so adding his alternate email
he provided back then.
Gr. AvS
On 06/07/2018 04:59 PM, Cong Wang wrote:
On Thu, Jun 7, 2018 at 4:48 PM, <greearb@xxxxxxxxxxxxxxx> wrote:
diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index be7c0fa..cb911f0 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
return NULL;
}
- flow = list_first_entry(head, struct fq_flow, flowchain);
+ flow = list_first_entry_or_null(head, struct fq_flow,
flowchain);
+
+ if (WARN_ON_ONCE(!flow))
+ return NULL;
This does not make sense either. list_first_entry_or_null()
returns NULL only when the list is empty, but we already check
list_empty() right before this code, and it is protected by fq->lock.
Hello Michal,
git blame shows you as the author of the fq_impl.h code.
I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks
kernel. There was an apparent
mostly-null deref in the fq_tin_dequeue method. According to gdb, it
was within
1 line of the dereference of 'flow'.
My hack above is probably not that useful. Cong thinks maybe the
locking is bad.
If you get a chance, please review this thread and see if you have any
ideas for
a better fix (or better debugging code).
As always, if you would like me to generate you a buggy firmware that
will crash
in the tx path and cause all sorts of mayhem in the ath10k driver and
wifi stack,
I will be happy to do so.
https://www.mail-archive.com/netdev@xxxxxxxxxxxxxxx/msg239738.html
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com