Hi, On 2024-09-17 12:15 +03:00, Felix Fietkau wrote: > On 17.09.24 08:17, Kalle Valo wrote: >> Lorenzo Bianconi <lorenzo@xxxxxxxxxx> writes: >> >>>> Hi, >>>> >>>> I ran into some bug messages while testing linux-next on a MT8186 >>>> Magneton Chromebook (mt8186-corsola-magneton-sku393218). It boots >>>> to the OS, but at least Wi-Fi and Bluetooth are unavailable. >>>> >>>> As a start, I tried reverting commit abbd838c579e ("Merge tag >>>> 'mt76-for-kvalo-2024-09-06' of https://github.com/nbd168/wireless") >>>> and it works fine after that. Didn't have time to do a full bisect, >>>> but will try if nobody has any immediate opinions. >>>> >>>> There are a few traces, here's some select lines to catch your attention, >>>> not sure how informational they are: >>>> >>>> [ 16.040525] kernel BUG at net/core/skbuff.c:2268! >>>> [ 16.040531] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >>>> [ 16.040803] CPU: 3 UID: 0 PID: 526 Comm: mt76-sdio-txrx Not tainted >>>> 6.11.0-next-20240916-deb-00002-g7b544e01c649 #1 >>>> [ 16.040897] Call trace: >>>> [ 16.040899] pskb_expand_head+0x2b0/0x3c0 >>>> [ 16.040905] mt76s_tx_run_queue+0x274/0x410 [mt76_sdio] >>>> [ 16.040909] mt76s_txrx_worker+0xe4/0xac8 [mt76_sdio] >>>> [ 16.040914] mt7921s_txrx_worker+0x98/0x1e0 [mt7921s] >>>> [ 16.040924] __mt76_worker_fn+0x80/0x128 [mt76] >>>> [ 16.040934] kthread+0xe8/0xf8 >>>> [ 16.040940] ret_from_fork+0x10/0x20 >>> >>> Hi, >>> >>> I guess this issue has been introduced by the following commit: >>> >>> commit 3688c18b65aeb2a1f2fde108400afbab129a8cc1 >>> Author: Felix Fietkau <nbd@xxxxxxxx> >>> Date: Tue Aug 27 11:30:01 2024 +0200 >>> >>> wifi: mt76: mt7915: retry mcu messages >>> >>> In some cases MCU messages can get lost. Instead of failing completely, >>> attempt to recover by re-sending them. >>> >>> Link: https://patch.msgid.link/20240827093011.18621-14-nbd@xxxxxxxx >>> Signed-off-by: Felix Fietkau <nbd@xxxxxxxx> >>> >>> >>> In particular, skb_get() in mt76_mcu_skb_send_and_get_msg() is bumping skb users >>> refcount (making the skb shared) and pskb_expand_head() (run by __skb_grow() in >>> mt76s_tx_run_queue()) does not like shared skbs. >>> >>> @Felix: any input on it? > > Sorry about that. Please try this patch, it should probably resolve this issue: > > --- > --- a/drivers/net/wireless/mediatek/mt76/mcu.c > +++ b/drivers/net/wireless/mediatek/mt76/mcu.c > @@ -84,13 +84,15 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb, > mutex_lock(&dev->mcu.mutex); > > if (dev->mcu_ops->mcu_skb_prepare_msg) { > + orig_skb = skb; > ret = dev->mcu_ops->mcu_skb_prepare_msg(dev, skb, cmd, &seq); > if (ret < 0) > goto out; > } > > retry: > - orig_skb = skb_get(skb); > + if (orig_skb) > + skb_get(orig_skb); > ret = dev->mcu_ops->mcu_skb_send_msg(dev, skb, cmd, &seq); > if (ret < 0) > goto out; > @@ -105,7 +107,7 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb, > do { > skb = mt76_mcu_get_response(dev, expires); > if (!skb && !test_bit(MT76_MCU_RESET, &dev->phy.state) && > - retry++ < dev->mcu_ops->max_retry) { > + orig_skb && retry++ < dev->mcu_ops->max_retry) { > dev_err(dev->dev, "Retry message %08x (seq %d)\n", > cmd, seq); > skb = orig_skb; > Tested-by: Alper Nebi Yasak <alpernebiyasak@xxxxxxxxx> Thanks!