On 4.4.23. 03:30, Ajay.Kathat@xxxxxxxxxxxxx wrote: > On 4/3/23 07:24, Kirill Buksha wrote: >> [Some people who received this message don't often get email from kirbuk200@xxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >> >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >> >> On 16.12.22. 11:18, Michael Walle wrote: >>> Hi, >>> >>> On 22/12/09 02:14, Ajay.Kathat@xxxxxxxxxxxxx wrote: >>>> No progress yet. I tried to simulate the condition a few times but was >>>> unable to see the exact failure in my setup so I need to try more. >>> Shouldn't it also be possible to see the issue by code reading? I've >>> provided the call tree in my previous mail and my concerns regarding >>> the locking. Either I'm missing something there or there is no >>> locking between these threads which could cause this issue. >>> >>>> For the other "FW not responding" continuous logs, I got some clue. >>>> Probably, will try to send that patch first. >>> Ok, let me know if you have some patches, I'm happy to test them. >>> >>> -michael >>> >>> >> Hello, >> >> I faced the same kernel oops issue. After analyzing my logs and brief >> debugging, I agree with Mikhail: the problem seems to be accessing the >> scan_result pointer after it has been nulled. > I have submitted a patch [1] which has fix for scan_result NULL pointer > exception issue. The submitted patch handles the synchronization between > mac_close() and asynchronous interrupts from firmware. Basically, it > takes care of blocking the execution of mac_close() till all pending > works are completed and afterward no new work addition is allowed since > the close is in progress. It is worth to try with that patch once and > check it's behavior. > > 1. > https://lore.kernel.org/linux-wireless/20230404012010.15261-1-ajay.kathat@xxxxxxxxxxxxx/T/#u Thank you for the patch. I will take a look/test it when I have time. >> Regarding the solution: if there is a race between two threads (as >> Michael described earlier), then I think that the locking mechanism will >> be the most reliable solution. We ran into problems during >> deinitialization, but driver contains two more places >> (handle_scan_done() and wilc_disconnect() functions in wilc1000/hif.c), >> where scan_result is set to NULL. >> >> I use NetworkManager to manage networks and I have experienced the same >> failure multiple times when switching from one WiFi network to another. >> Keep in mind that switching between networks calls wilc_disconnect() and >> wilc_deinit() functions and it is not yet clear which one is causing a >> core dump. I think it's worth at least taking a look at these areas of >> the code. What do you think? > If possible, please share the sequence(commands) for Wifi network > switching scenario. It looks like both functions(mac_close & disconnect) > are getting called from user context. mac_close() is a netdevice > callback whereas wilc_disconnect() is a cfg80211 callback. Generally, > wilc_disconnect() should be enough to disconnect from current Wifi > network without bringing the complete interface down. Is NetworkManager > closing the interface(mac_close()) before switching the WiFi network. > > > Regards, > Ajay The commands are as follows: while true; do nmcli c up wlan0-client; nmcli c up wlan0-client-2; done It takes about 5 minutes until I see the core dump. I see following message after every command: ... wilc1000_sdio mmc0:0001:1 wlan0: Deinitializing wilc1000... ... Message above comes from wilc_wlan_deinitialize() function which is called from wilc_mac_close(). It seems that interface is closed between connections. Best regards, Kirill Buksha.