Search Linux Wireless

Re: [RFC] ath10k: Fix DMA errors related to beacons (CT FW only)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 28 Feb 2019 at 16:59, <greearb@xxxxxxxxxxxxxxx> wrote:
>
> From: Ben Greear <greearb@xxxxxxxxxxxxxxx>
>
> I often saw the ath10k-ct wave-1 firmware spit DMA errors and
> hang the entire system, requiring a hard power-cycle to revoer.
>
> It appears the issue is that there is no beacon-tx callback in
> stock firmware, so the driver can delete the beacon DMA buffer
> while firmware is still trying to access it.
>
> So, wave-1 ath10k-ct firmware now sends a beacon-tx-complete
> wmi message and that allows the driver to safely know when it
> can clean up the buffer.
>
> Signed-off-by: Ben Greear <greearb@xxxxxxxxxxxxxxx>
> ---
>
> NOTE:  This will not apply or work in upstream kernels since the
> rest of the CT fw support will not be accepted.  But, I'd appreciate
> any technical feedback on this in case I missed any corner cases
> on locking or similar.

For the record this patch seems to be based on code with "ath10k: Free
beacon buf later in vdev teardown" included.


[...]
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> index 154dcdabc48a..02a8efa2e783 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
[...]
> @@ -6147,6 +6151,16 @@ static void ath10k_remove_interface(struct ieee80211_hw *hw,
>                 ath10k_warn(ar, "failed to stop spectral for vdev %i: %d\n",
>                             arvif->vdev_id, ret);
>
> +       if (test_bit(ATH10K_FW_FEATURE_BEACON_TX_CB_CT,
> +                     ar->running_fw->fw_file.fw_features)) {
> +               int time_left;
> +
> +               time_left = wait_for_completion_timeout(&arvif->beacon_tx_done, (3 * HZ));
> +               if (!time_left)
> +                       ath10k_warn(ar, "WARNING: failed to wait for beacon tx callback for vdev %i: %d\n",
> +                                   arvif->vdev_id, ret);
> +       }

I think this can race against wmi tx credits replenishment and maybe
other wmi commands if they get issued (not many of them I suppose
though). The ordering would need to be something like this:

cpu0                   cpu1
                       ath10k_wmi_op_ep_tx_credits
                       ieee80211_iterate_active_interfaces_atomic
                       ath10k_wmi_beacon_send_ref_nowait
                       # preempted
ieee80211_do_stop
clear sdata running
drv_remove_interface
wait_for_completion_timeout
ath10k_mac_vif_beacon_cleanup
free/unmap
                       gen_beacon_dma(bcn->xxx)
                       reinit_completion()
                       ath10k_wmi_cmd_send_nowait()

Interestingly, this also shows it has been a possible use-after-free
without your patch. This is probably a unlikely scenario because
there's a disproportionate amount of code between these flows for this
to happen.

There's also a memory leak because ath10k_mac_vif_beacon_free() exits
without freeing arvif->beacon if beacon_state is SENDING, which it
will be in in the above race case.

With the per-skb beaconing case (which is dead code by the way it
seems..) iommu wouldn't be happy either because we'd hand over an
unmapped paddr to ath10k_wmi_cmd_send_nowait(). That's assuming kernel
didn't crash due to use-after-free on arvif->beacon (stored in `bcn`
on stack) before that.

This feels like a candidate for rcu if you wanted to have it fixed
neatly. It would fix both use-after-free and the locking conundrum -
ath10k_remove_interface() could NULL the pointer, call_rcu() and
wait_for_completion() in that call handler, and when its called back
unmap/free the beacon. This would require a refactor to how beacon
stuff is stored/maintained - a helper structure that would need carry
the beacon stuff + rcu_head, and this structure would be allocated and
assigned with rcu_assign_pointer() to a pointer in ath10k_vif. But I
feel like it's asking for unbound allocations or other bugs,
especially since I'm...

Not sure if it makes any sense. I would probably just rip out the
arvif->beacon code completely instead and keep arvif->beacon_buf only
(memcpy on swba and immediatelly free the skbuff). No point in fixing
dead code, is there?

Since you have beacon tx completion you can consider preventing beacon
corruption, similar to how disabled vsync causes frame tearing, by
avoiding memcpy() if completion for previous one didn't come. Not sure
how relevant that is though because if fw/hw are having hard time
transmitting a beacon then the RF condition must be really harsh and
unusably bad anyway.



Michał




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux