Search Linux Wireless

Re: [RFC] ath10k: Fix DMA errors related to beacons (CT FW only)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/28/19 10:54 AM, Michał Kazior wrote:
On Thu, 28 Feb 2019 at 16:59, <greearb@xxxxxxxxxxxxxxx> wrote:

From: Ben Greear <greearb@xxxxxxxxxxxxxxx>

I often saw the ath10k-ct wave-1 firmware spit DMA errors and
hang the entire system, requiring a hard power-cycle to revoer.

It appears the issue is that there is no beacon-tx callback in
stock firmware, so the driver can delete the beacon DMA buffer
while firmware is still trying to access it.

So, wave-1 ath10k-ct firmware now sends a beacon-tx-complete
wmi message and that allows the driver to safely know when it
can clean up the buffer.

Signed-off-by: Ben Greear <greearb@xxxxxxxxxxxxxxx>
---

NOTE:  This will not apply or work in upstream kernels since the
rest of the CT fw support will not be accepted.  But, I'd appreciate
any technical feedback on this in case I missed any corner cases
on locking or similar.

For the record this patch seems to be based on code with "ath10k: Free
beacon buf later in vdev teardown" included.


[...]
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 154dcdabc48a..02a8efa2e783 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
[...]
@@ -6147,6 +6151,16 @@ static void ath10k_remove_interface(struct ieee80211_hw *hw,
                 ath10k_warn(ar, "failed to stop spectral for vdev %i: %d\n",
                             arvif->vdev_id, ret);

+       if (test_bit(ATH10K_FW_FEATURE_BEACON_TX_CB_CT,
+                     ar->running_fw->fw_file.fw_features)) {
+               int time_left;
+
+               time_left = wait_for_completion_timeout(&arvif->beacon_tx_done, (3 * HZ));
+               if (!time_left)
+                       ath10k_warn(ar, "WARNING: failed to wait for beacon tx callback for vdev %i: %d\n",
+                                   arvif->vdev_id, ret);
+       }

I think this can race against wmi tx credits replenishment and maybe
other wmi commands if they get issued (not many of them I suppose
though). The ordering would need to be something like this:

cpu0                   cpu1
                        ath10k_wmi_op_ep_tx_credits
                        ieee80211_iterate_active_interfaces_atomic
                        ath10k_wmi_beacon_send_ref_nowait
                        # preempted
ieee80211_do_stop
clear sdata running
drv_remove_interface
wait_for_completion_timeout
ath10k_mac_vif_beacon_cleanup
free/unmap
                        gen_beacon_dma(bcn->xxx)
                        reinit_completion()
                        ath10k_wmi_cmd_send_nowait()

Interestingly, this also shows it has been a possible use-after-free
without your patch. This is probably a unlikely scenario because
there's a disproportionate amount of code between these flows for this
to happen.

Thanks for the thorough review.  One thing, the firmware
cannot guarantee it can send the event (due to potential WMI event buffer
exhaustion).  It would be rare that you got unlucky enough to hit the
case where FW ran out of msg buffers right as you were deleting an
interface though.

As far as I can tell, my patch at least does not make the use-after-free
scenario worse, so I'm tempted to ignore that for now.


There's also a memory leak because ath10k_mac_vif_beacon_free() exits
without freeing arvif->beacon if beacon_state is SENDING, which it
will be in in the above race case.

Ok, this could be fixed by definitely freeing the mem in the remove_interface
logic, and is not new to my patch, right?


With the per-skb beaconing case (which is dead code by the way it
seems..) iommu wouldn't be happy either because we'd hand over an
unmapped paddr to ath10k_wmi_cmd_send_nowait(). That's assuming kernel
didn't crash due to use-after-free on arvif->beacon (stored in `bcn`
on stack) before that >
This feels like a candidate for rcu if you wanted to have it fixed
neatly. It would fix both use-after-free and the locking conundrum -
ath10k_remove_interface() could NULL the pointer, call_rcu() and
wait_for_completion() in that call handler, and when its called back
unmap/free the beacon. This would require a refactor to how beacon
stuff is stored/maintained - a helper structure that would need carry
the beacon stuff + rcu_head, and this structure would be allocated and
assigned with rcu_assign_pointer() to a pointer in ath10k_vif. But I
feel like it's asking for unbound allocations or other bugs,
especially since I'm...

This callback only happens at all with CT fw, and I definitely don't want to
deal with an out-of-tree rcu patch.  And, you are not guaranteed to get
the callback as previously mentioned.  So I think rcu might be overkill
for this?


Not sure if it makes any sense. I would probably just rip out the
arvif->beacon code completely instead and keep arvif->beacon_buf only
(memcpy on swba and immediatelly free the skbuff). No point in fixing
dead code, is there?

Since you have beacon tx completion you can consider preventing beacon
corruption, similar to how disabled vsync causes frame tearing, by
avoiding memcpy() if completion for previous one didn't come. Not sure
how relevant that is though because if fw/hw are having hard time
transmitting a beacon then the RF condition must be really harsh and
unusably bad anyway.

Thanks,
Ben


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux