Hi, On Mon, Jul 27, 2020 at 6:45 PM Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > > syzbot is reporting that del_timer_sync() is called from > mwifiex_usb_cleanup_tx_aggr() from mwifiex_unregister_dev() without > checking timer_setup() from mwifiex_usb_tx_init() was called [1]. > Since mwifiex_usb_prepare_tx_aggr_skb() is calling del_timer() if > is_hold_timer_set == true, use the same condition for del_timer_sync(). > > [1] https://syzkaller.appspot.com/bug?id=fdeef9cf7348be8b8ab5b847f2ed993aba8ea7b6 > > Reported-by: syzbot <syzbot+373e6719b49912399d21@xxxxxxxxxxxxxxxxxxxxxxxxx> > Cc: Ganapathi Bhat <gbhat@xxxxxxxxxxx> > Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > --- > A patch from Ganapathi Bhat ( https://patchwork.kernel.org/patch/10990275/ ) is stalling > at https://lore.kernel.org/linux-usb/MN2PR18MB2637D7C742BC235FE38367F0A09C0@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ . > syzbot by now got this report for 10000 times. Do we want to go with this simple patch? Sorry, that stall is partly my fault, and partly Ganapathi's. It doesn't help that it took him 4 months to reply to my questions, so I completely lost even the tiny bit of context I had managed to build up in my head at initial review time... and so it's still buried in the dark corners of my inbox. (I think I'll go archive that now, because it really deserves a better sell than it had initially, if Ganapathi really wants to land it.) > drivers/net/wireless/marvell/mwifiex/usb.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c b/drivers/net/wireless/marvell/mwifiex/usb.c > index 6f3cfde..04a1461 100644 > --- a/drivers/net/wireless/marvell/mwifiex/usb.c > +++ b/drivers/net/wireless/marvell/mwifiex/usb.c > @@ -1353,7 +1353,8 @@ static void mwifiex_usb_cleanup_tx_aggr(struct mwifiex_adapter *adapter) > skb_dequeue(&port->tx_aggr.aggr_list))) > mwifiex_write_data_complete(adapter, skb_tmp, > 0, -1); > - del_timer_sync(&port->tx_aggr.timer_cnxt.hold_timer); > + if (port->tx_aggr.timer_cnxt.is_hold_timer_set) I believe if we ever actually started aggregation, then the timer can be active at this point, and thus, the access to 'is_hold_timer_set' is racy. This *probably* deserves a better refactor, but in absence of that (and a better explanation than Ganapathi gave), I think you at least need to hold port->tx_aggr_lock. So perhaps (totally untested): spin_lock_bh(&port->tx_aggr_lock); if (port->tx_aggr.timer_cnxt.is_hold_timer_set) { port->tx_aggr.timer_cnxt.is_hold_timer_set = false; spin_unlock_bh(&port->tx_aggr_lock); /* Timer could still be running, but it can't be restarted at this point, so this is safe. */ del_timer_sync(&port->tx_aggr.timer_cnxt.hold_timer); } else { spin_unlock_bh(&port->tx_aggr_lock); } Otherwise, I think this is fine: Reviewed-by: Brian Norris <briannorris@xxxxxxxxxxxx> I also believe mwifiex_usb_prepare_tx_aggr_skb() needs to stop using del_timer() (without the _sync()), because otherwise we might have deactivated the timer already but not ensured that it has completely finished executing on other CPUs. But that is probably orthogonal to the current patch. (Again, so much in this driver needs refactoring.) Side note: this entire TX aggregation feature for USB has been hidden behind the mwifiex.aggr_ctrl module param since its introduction, which has always been disabled by default. I wonder whether anybody is *really* testing it, or whether it's 100% broken, as with many things in this driver... Brian > + del_timer_sync(&port->tx_aggr.timer_cnxt.hold_timer); > port->tx_aggr.timer_cnxt.is_hold_timer_set = false; > port->tx_aggr.timer_cnxt.hold_tmo_msecs = 0; > } > -- > 1.8.3.1 >