On 7/2/2022 7:35 PM, Danny van Heumen wrote:
In case of buggy firmware, brcmfmac may perform a hardware reset. If during
reset and subsequent probing an early failure occurs, a memory region is
accidentally double-freed. With hardened memory allocation enabled, this error
will be detected.
- return early where appropriate to skip unnecessary clean-up.
- set '.freezer' pointer to NULL to prevent double-freeing under possible
other circumstances and to re-align result under various different
behaviors of memory allocation freeing.
- correctly claim host on func1 for disabling func2.
- after reset, do not initiate probing immediately, but rely on events.
Given a firmware crash, function 'brcmf_sdio_bus_reset' is called. It calls
'brcmf_sdiod_remove', then follows up with 'brcmf_sdiod_probe' to reinitialize
the hardware. If 'brcmf_sdiod_probe' fails to "set F1 blocksize", it exits
early, which includes calling 'brcmf_sdiod_remove'. In both cases
'brcmf_sdiod_freezer_detach' is called to free allocated '.freezer', which
has not yet been re-allocated the second time.
Stacktrace of (failing) hardware reset after firmware-crash:
Code: b9402b82 8b0202c0 eb1a02df 54000041 (d4210000)
ret_from_fork+0x10/0x20
kthread+0x154/0x160
worker_thread+0x188/0x504
process_one_work+0x1f4/0x490
brcmf_core_bus_reset+0x34/0x44 [brcmfmac]
brcmf_sdio_bus_reset+0x68/0xc0 [brcmfmac]
brcmf_sdiod_probe+0x170/0x21c [brcmfmac]
brcmf_sdiod_remove+0x48/0xc0 [brcmfmac]
kfree+0x210/0x220
__slab_free+0x58/0x40c
Call trace:
x2 : 0000000000000040 x1 : fffffc00002d2b80 x0 : ffff00000b4aee40
x5 : ffff8000013fa728 x4 : 0000000000000001 x3 : ffff00000b4aee00
x8 : ffff800009967ce0 x7 : ffff8000099bfce0 x6 : 00000006f8005d01
x11: ffff8000099bfce0 x10: 00000000fffff000 x9 : ffff8000083401d0
x14: 0000000000000000 x13: 657a69736b636f6c x12: 6220314620746573
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
x20: fffffc00002d2ba0 x19: fffffc00002d2b80 x18: 0000000000000000
x23: ffff00000b4aee00 x22: ffff00000b4aee00 x21: 0000000000000001
x26: ffff00000b4aee00 x25: ffff0000f7753705 x24: 000000000001288a
x29: ffff80000a22bbf0 x28: ffff000000401200 x27: 000000008020001a
sp : ffff80000a22bbf0
lr : kfree+0x210/0x220
pc : __slab_free+0x58/0x40c
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Workqueue: events brcmf_core_bus_reset [brcmfmac]
Hardware name: Pine64 Pinebook Pro (DT)
CPU: 2 PID: 639 Comm: kworker/2:2 Tainted: G C 5.16.0-0.bpo.4-arm64 #1 Debian 5.16.12-1~bpo11+1
nvmem_rockchip_efuse industrialio_triggered_buffer videodev snd_soc_core snd_pcm_dmaengine kfifo_buf snd_pcm io_domain mc industrialio mt>
Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reje>
Internal error: Oops - BUG: 0 [#1] SMP
kernel BUG at mm/slub.c:379!
Signed-off-by: Danny van Heumen <danny@xxxxxxxxxxxxxxxxx>
---
It is good practice to throw in a changelog here so people know what has
changed since earlier version of the patch.
---
.../broadcom/brcm80211/brcmfmac/bcmsdh.c | 31 ++++++++++++-------
.../broadcom/brcm80211/brcmfmac/sdio.c | 10 +-----
2 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
index ac02244a6fdf..dd634edaa0b3 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
[...]
@@ -1096,12 +1093,24 @@ static void brcmf_ops_sdio_remove(struct sdio_func *func)
if (bus_if) {
sdiodev = bus_if->bus_priv.sdio;
+ if (func->num != 1) {
+ /* Satisfy kernel expectation that the irq is released once the
+ * '.remove' callback has executed, while respecting the design
+ * that removal is executed for 'sdiodev', instead of individual
+ * function.
+ */
+ brcmf_dbg(SDIO, "Only release irq for function %d", func->num);
+ sdio_claim_host(sdiodev->func1);
+ sdio_release_irq(func);
+ sdio_release_host(sdiodev->func1);
+ return;
+ }
+
+ /* func 1: so do full clean-up and removal */
+
The problem is that upon driver unload we get remove for function 2 and
then for function 1. Upon mmc_hw_reset() we get a remove for function 1
and then for function 2. So in the scenario of mmc_hw_reset() we free
sdiodev upon func 1 removal and then for func 2 removal we have a
use-after-free of sdiodev. The code currently relies on the order in
which remove callback is done. To make it more robust we could throw in
a refcount for sdiodev and only do the full clean-up when refcount hits
zero.
Regards,
Arend