Hi On Thu, May 8, 2014 at 8:18 PM, Rajkumar Manoharan <rmanohar@xxxxxxxxxxxxxxxx> wrote: > On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote: >> ah->caldata may be NULL if no channel is selected. Check for that before >> accessing it. >> >> Signed-off-by: David Herrmann <dh.herrmann@xxxxxxxxx> >> --- >> Hi >> >> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata >> is freed while we run in hw_per_calibration(). However, this patch fixes serious >> kernel panics with wifi-P2P on my machine. >> >> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the >> correct fix would be to synchronously stop any running hw-calibration before >> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote >> this small workaround. >> > David, > > Whenever the DUT is moving to off-channel, ah->caldata is set to NULL in > hw_reset. As you mentioned, before doing hw_reset, the on-going calibration is stopped > synchronously. I using ar9280 for p2p (GO & CLI) validation. Somehow i do not observe > the panics. Is there a easiest way to reproduce the problem. Are you > using wireless-testing tree? Thanks for reporting the problem. Will try > to fix asap. Reproducing it is actually quite easy on my machine. Whenever I start a P2P-connect from my Android-phone to my linux-host and _immediately_ accept it (via p2p_connect on wpas), I get the kernel-panic. Adding the NULL-protection fixes this. However, if I delay accepting the connection (ie, issuing p2p_connect by hand instead of automatically), I cannot see the bug. Furthermore, on my slower Intel Core 2 Duo, the bug happens much less likely. On my ARM machine I never saw this happening. Given that my main machine is an Intel hsw quad-core, I guess it's a simple race-condition. I also added a printk() whenever caldata is NULL and noticed that it fires only during the first 2 or 3 runs. After that, it never happened again. The bug happens on all linux kernels I tested (starting with 3.9ish up to linux-next). However, if I apply my fix, anything after 3.13-stable fails to transmit DHCP data. I can connect properly but DHCP always times out. I'm not sure why that happens and I'm still debugging this, but it's quite likely a separate issue. (if I find some time, I will bisect this) I now looked at the ath9k code and I couldn't see any locking around the hw_reset at all. I don't know whether the wifi-core / nl80211 locks this, but what happens if two hw_resets race each other? Just a guess.. I will try to look into it tomorrow. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html