On 10/16/2024 5:37 AM, Jose Ignacio Tornos Martinez wrote: > If there is an error during some initialization realated to firmware, previous review comment missed: s/realated/related/ > the buffers dp->tx_ring[i].tx_status are released. > However this is released again when the device is unbinded (ath12k_pci), > and we get: > [ 41.271233] WARNING: CPU: 0 PID: 2098 at mm/slub.c:4689 free_large_kmalloc+0x4d/0x80 > [ 41.271246] Modules linked in: uinput snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc qrtr_mhi intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry pmt_class kvm_intel kvm rapl qrtr snd_hda_codec_generic ath12k qmi_helpers snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi iTCO_wdt intel_pmc_bxt mac80211 snd_hda_codec iTCO_vendor_support libarc4 snd_hda_core snd_hwdep snd_seq snd_seq_device cfg80211 snd_pcm pcspkr i2c_i801 snd_timer i2c_smbus snd rfkill soundcore lpc_ich mhi virtio_balloon joydev xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_net virtio_blk virtio_console virtio_gpu net_failover failover virtio_dma_buf serio_raw fuse qemu_fw_cfg > [ 41.271284] CPU: 0 UID: 0 PID: 2098 Comm: bash Kdump: loaded Not tainted 6.12.0-rc1+ #29 > [ 41.271286] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 > [ 41.271287] RIP: 0010:free_large_kmalloc+0x4d/0x80 > [ 41.271289] Code: 00 10 00 00 48 d3 e0 f7 d8 81 e2 c0 00 00 00 75 2f 89 c6 48 89 df e8 82 ff ff ff f0 ff 4b 34 0f 85 59 0e ce 00 e9 5b 0e ce 00 <0f> 0b 80 3d c8 29 3c 02 00 0f 84 2d 0e ce 00 b8 00 f0 ff ff eb d1 > [ 41.271290] RSP: 0018:ffffa40881a33c50 EFLAGS: 00010246 > [ 41.271292] RAX: 000fffffc0000000 RBX: ffffe697c0278000 RCX: 0000000000000000 > [ 41.271293] RDX: ffffe697c0b60008 RSI: ffff8d00c9e00000 RDI: ffffe697c0278000 > [ 41.271294] RBP: ffff8d00c3af0000 R08: ffff8d00f215d0c0 R09: 0000000080400038 > [ 41.271294] R10: 0000000080400038 R11: 0000000000000000 R12: 0000000000000001 > [ 41.271295] R13: ffffffffc0ef8948 R14: ffffffffc0ef8948 R15: ffff8d00c1277560 > [ 41.271296] FS: 00007fd31e556740(0000) GS:ffff8d011e400000(0000) knlGS:0000000000000000 > [ 41.271297] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.271298] CR2: 00007f778d3ffb38 CR3: 00000000065dc000 CR4: 0000000000752ef0 > [ 41.271301] PKRU: 55555554 > [ 41.271302] Call Trace: > [ 41.271304] <TASK> > [ 41.271304] ? free_large_kmalloc+0x4d/0x80 > [ 41.271306] ? __warn.cold+0x93/0xfa > [ 41.271308] ? free_large_kmalloc+0x4d/0x80 > [ 41.271311] ? report_bug+0xff/0x140 > [ 41.271314] ? handle_bug+0x58/0x90 > [ 41.271316] ? exc_invalid_op+0x17/0x70 > [ 41.271317] ? asm_exc_invalid_op+0x1a/0x20 > [ 41.271321] ? free_large_kmalloc+0x4d/0x80 > [ 41.271323] ath12k_dp_free+0xdc/0x110 [ath12k] > [ 41.271337] ath12k_core_deinit+0x8d/0xb0 [ath12k] > [ 41.271345] ath12k_pci_remove+0x50/0xf0 [ath12k] > [ 41.271354] pci_device_remove+0x3f/0xb0 > [ 41.271356] device_release_driver_internal+0x19c/0x200 > [ 41.271359] unbind_store+0xa1/0xb0 FYI I didn't comment on this previously but <https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages> has some guidance on trimming backtraces in commit messages. > ... > > The issue is always reproducible from a VM because the MSI addressing > initialization is failing. > > In order to fix the issue, just check if the buffers were already released > and if they need to be released, in addition set to NULL for the checking. > > cc: stable@xxxxxxxxxxxxxxx > Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices") > Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@xxxxxxxxxx> > --- > v2: > - Fix the commit size in Fixes > v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-3-jtornosm@xxxxxxxxxx/ > > drivers/net/wireless/ath/ath12k/dp.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c > index 789d430e4455..9d878d815f3c 100644 > --- a/drivers/net/wireless/ath/ath12k/dp.c > +++ b/drivers/net/wireless/ath/ath12k/dp.c > @@ -1277,8 +1277,12 @@ void ath12k_dp_free(struct ath12k_base *ab) > > ath12k_dp_rx_reo_cmd_list_cleanup(ab); > > - for (i = 0; i < ab->hw_params->max_tx_ring; i++) > - kfree(dp->tx_ring[i].tx_status); > + for (i = 0; i < ab->hw_params->max_tx_ring; i++) { > + if (dp->tx_ring[i].tx_status) { I still don't understand why this check is necessary. Why isn't just adding the "dp->tx_ring[i].tx_status = NULL" inside the loop sufficient? You previously wrote the "warning comes from free_large_kmalloc". Is that warning due to a NULL pointer, or due to a double free? I don't see how it could be from a NULL pointer since in kfree() the 1st thing it does is trace the entry, but then the 2nd thing it does is return "if (unlikely(ZERO_OR_NULL_PTR(object)))" So I'm suspecting a double free, and setting dp->tx_ring[i].tx_status = NULL should fix that issue. > + kfree(dp->tx_ring[i].tx_status); > + dp->tx_ring[i].tx_status = NULL; > + } > + } > > ath12k_dp_rx_free(ab); > /* Deinit any SOC level resource */