Search Linux Wireless

Re: Re: Re: [PATCH] mwifiex: cancel pcie/sdio work in remove/shutdown handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Brian,

> -----Original Message-----
> From: Brian Norris [mailto:briannorris@xxxxxxxxxxxx]
> Sent: 2018年1月10日 6:01
> To: Xinming Hu <huxm@xxxxxxxxxxx>; Kalle Valo <kvalo@xxxxxxxxxxxxxx>
> Cc: Kalle Valo <kvalo@xxxxxxxxxxxxxx>; Linux Wireless
> <linux-wireless@xxxxxxxxxxxxxxx>; Dmitry Torokhov <dtor@xxxxxxxxxx>;
> rajatja@xxxxxxxxxx; Zhiyuan Yang <yangzy@xxxxxxxxxxx>; Tim Song
> <songtao@xxxxxxxxxxx>; Cathy Luo <cluo@xxxxxxxxxxx>; James Cao
> <jcao@xxxxxxxxxxx>; Ganapathi Bhat <gbhat@xxxxxxxxxxx>; Ellie Reeves
> <ellierevves@xxxxxxxxx>; Christoph Hellwig <hch@xxxxxx>
> Subject: [EXT] Re: Re: [PATCH] mwifiex: cancel pcie/sdio work in
> remove/shutdown handler
> 
> External Email
> 
> ----------------------------------------------------------------------
> + Christopher
> 
> Hi Simon and Kalle,
> 
> On Tue, Jan 09, 2018 at 11:42:21AM +0000, Xinming Hu wrote:
> > Hi,
> >
> > > -----Original Message-----
> > > From: Kalle Valo [mailto:kvalo@xxxxxxxxxxxxxx]
> > > Sent: 2018年1月9日 15:40
> > > To: Brian Norris <briannorris@xxxxxxxxxxxx>
> > > Cc: Xinming Hu <huxm@xxxxxxxxxxx>; Linux Wireless
> > > <linux-wireless@xxxxxxxxxxxxxxx>; Dmitry Torokhov <dtor@xxxxxxxxxx>;
> > > rajatja@xxxxxxxxxx; Zhiyuan Yang <yangzy@xxxxxxxxxxx>; Tim Song
> > > <songtao@xxxxxxxxxxx>; Cathy Luo <cluo@xxxxxxxxxxx>; James Cao
> > > <jcao@xxxxxxxxxxx>; Ganapathi Bhat <gbhat@xxxxxxxxxxx>; Ellie Reeves
> > > <ellierevves@xxxxxxxxx>
> > > Subject: [EXT] Re: [PATCH] mwifiex: cancel pcie/sdio work in
> > > remove/shutdown handler
> > >
> > > External Email
> > >
> > > --------------------------------------------------------------------
> > > -- Brian Norris <briannorris@xxxxxxxxxxxx> writes:
> > >
> > > >> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > >> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > >> @@ -310,6 +310,8 @@ static void mwifiex_pcie_remove(struct
> > > >> pci_dev
> > > *pdev)
> > > >>  		mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
> > > >>  	}
> > > >>
> > > >> +	cancel_work_sync(&card->work);
> > > >> +
> > > >
> > > > Just FYI, this "fix" is not a real fix. It will likely paper over
> > > > some of your bugs (where, e.g., the FW shutdown command times out
> > > > in the previous couple of lines), but this highlights the fact
> > > > that there are other races that could trigger the same behavior.
> > > > You're not fixing those.
> > > >
> > > > For example, what if somebody initiates a scan or other nl80211
> > > > command between the above line and mwifiex_remove_card()? That
> > > command
> > > > could potentially time out too.
> > > >
> >
> > The hardware status have been reset before downloading the last
> > command(FUNC SHUTDOWN), in this way, follow commands  download will
> be
> > ignored and warned.
> 
> Hmm, I suppose that's true. So the race I'm talking about probably can't
> happen usually. What about in manufacturing mode
> or !FIRMWARE_READY_PCIE though? Those cases don't shut down the
> firmware. Can we still have outstanding timeouts in those cases?
> 
> Anyway, I still think there's a problem though, and this patch is just going to
> make things worse. See below.
> 
> > > > The proper fix would be to institute some kind of mutual exclusion
> > > > (locking) between mwifiex_shutdown_sw() and mwifiex_remove_card(),
> > > > so that they can't occur at the same time.
> > > >
> >
> > I am not sure whether there is any mutual exclusion protect between
> pcie_reset and pcie_remove in pcie core.
> > But it looks a different race.
> > We still need this fix, right?
> 
> Good point. Previously, there wasn't any such exclusion, and that's why races
> like the above were even more likely. But as of 4.13, now there
> *is* exclusion. See commit b014e96d1abb ("PCI: Protect
> pci_error_handlers->reset_notify() usage with device_lock()"). That incidentally
> means that you're creating a deadlock with this patch! [1]
> 
> If we start a timeout/reset sequence in mwifiex_init_shutdown_fw() (called
> from remove()), then you'll eventually have pci_reset_function() waiting on the
> device lock, but mwifiex_pcie_remove() will be holding the device lock already,
> and now (with your patch), remove() will be waiting on the worker thread to
> finish pci_reset_function()...deadlock!
> 
> I actually think that the above patch (adding device_lock()) resolves most of the
> race but introduces a possible deadlock. I believe an easy solution is just to
> switch to pci_try_reset_function() instead? That will just abort a reset attempt
> if we're in the middle of removing the device. Problem solved? Diff appended,
> but I'll send out a real version if that looks right. Can you test your original
> problem with the above commit from Christopher, as well as the appended
> diff?
> 

Since I don't have the original customer platform, which is 4.1 based.
Just run the stress test on my 4.14, with commands:
while true; do rmmod mwifiex;insmod $mwd/mwifiex.ko; sleep 1; insmod $mwd/mwifiex_pcie.ko; sleep 1; rmmod mwifiex_pcie & echo "1" > /sys/kernel/debug/mwifiex/mlan0/reset;done &

Till now, I didn't hit deadlock with/without below diff, though it looks finally will be reached, according to the above explanation.  
I guess, maybe rmmod operation is slowly then sysfs operation.

> > Regards,
> > Simon
> > > > Unfortunately, I only paid attention to this after Kalle already
> > > > applied this patch. Personally, I'd prefer this patch not get
> > > > applied, since it's a bad solution to an obvious problem, which
> > > > instead leaves a subtle problem that perhaps no one will bother fixing.
> > >
> > > I can revert it, that's not a problem. Can I use the text below as
> > > explanation for the revert?
> > >
> > > --------------------------------------------------------------------
> > > -- Brian Norris <briannorris@xxxxxxxxxxxx> says:
> > >
> > > Just FYI, this "fix" is not a real fix. It will likely paper over
> > > some of your bugs (where, e.g., the FW shutdown command times out in
> > > the previous couple of lines), but this highlights the fact that
> > > there are other races that could trigger the same behavior. You're not fixing
> those.
> > >
> > > For example, what if somebody initiates a scan or other nl80211
> > > command between the above line and mwifiex_remove_card()? That
> > > command could potentially time out too.
> > >
> > > The proper fix would be to institute some kind of mutual exclusion
> > > (locking) between mwifiex_shutdown_sw() and mwifiex_remove_card(),
> > > so that they can't occur at the same time.
> > >
> > > --------------------------------------------------------------------
> > > --
> 
> Hi Kalle, thanks for the above. With the new information from above, I think
> that there's a more accurate description, like the following:
> 
> ---
> Brian Norris <briannorris@xxxxxxxxxxxx> says:
> 
> The "fix" in question might not actually fix all related problems, and it also looks
> like it can cause a deadlock. Since commit b014e96d1abb ("PCI: Protect
> pci_error_handlers->reset_notify() usage with device_lock()"), the race in
> question is actually resolved (PCIe reset cannot happen at the same time as
> remove()). Instead, this "fix"
> just introduces a deadlock where mwifiex_pcie_card_reset_work() is waiting on
> device_lock, which is held by PCIe device remove(), which is waiting
> on...mwifiex_pcie_card_reset_work().
> 
> The proper thing to do is just to fix the deadlock. Patch for this will come
> separately.
> ---
> 
> Brian
> 
> [1] Technically, the deadlock is already there, since
> mwifiex_remove_card() eventually calls pcie.c's ->cleanup_if(), which also calls
> cancel_work_sync(). But your patch doesn't help...
> 
> Untested diff:
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index cd314946452c..5da3d6ccf5f2 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -2781,7 +2781,7 @@ static void mwifiex_pcie_card_reset_work(struct
> mwifiex_adapter *adapter)  {
>  	struct pcie_service_card *card = adapter->card;
> 
> -	pci_reset_function(card->dev);
> +	pci_try_reset_function(card->dev);
>  }
> 
>  static void mwifiex_pcie_work(struct work_struct *work)



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux