Dexuan Cui <decui@xxxxxxxxxxxxx> writes: >> -----Original Message----- >> From: Vitaly Kuznetsov >> Sent: Monday, July 13, 2015 20:19 >> Subject: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device >> shutdown >> >> When a new subchannel offer from host comes during device shutdown (e.g. >> when a netvsc/storvsc module is unloadedshortly after it was loaded) a >> crash can happen as vmbus_process_offer() is not anyhow serialized with >> vmbus_remove(). > > How about vmbus_onoffer_rescind()? > It's not serialized with vmbus_remove() either, so I think there is an issue too? > > I remember when 'rmmod hv_netvsc', we get a rescind-offer message for > each subchannel. > True, I think we have a race with rescind messages as well, we just never saw crashes for some reason. I'll think how we can make the fix more general. >> As an example we can try calling subchannel create >> callback when the module is already unloaded. >> The following crash was observed while keeping loading/unloading hv_netvsc >> module on 64-CPU guest: >> >> hv_netvsc vmbus_14: net device safe to remove >> BUG: unable to handle kernel paging request at 000000000000a118 >> IP: [<ffffffffa01c1b21>] netvsc_sc_open+0x31/0xb0 [hv_netvsc] >> PGD 1f3946a067 PUD 1f38a5f067 PMD 0 >> Oops: 0000 [#1] SMP >> ... >> Call Trace: >> [<ffffffffa0055cd7>] vmbus_onoffer+0x477/0x530 [hv_vmbus] >> [<ffffffff81092e1f>] ? move_linked_works+0x5f/0x80 >> [<ffffffffa0055fd3>] vmbus_onmessage+0x33/0xa0 [hv_vmbus] >> [<ffffffffa0052f81>] vmbus_onmessage_work+0x21/0x30 [hv_vmbus] >> [<ffffffff81095cde>] process_one_work+0x18e/0x4e0 >> ... >> >> The issue cannot be solved by just resetting sc_creation_callback on >> driver removal as while we search for the parent channel with channel_lock >> held we release it after the channel was found and it can disapper beneath >> our feet while we're still in vmbus_process_offer(); >> >> Introduce new sc_create_lock mutex and take it in vmbus_remove() to ensure >> no new subchannels are created after we started the removal procedure. >> Check its state with mutex_trylock in vmbus_process_offer(). > > In my 8-CPU VM, I can very easily reproduce the panic by > 1. running > while ((1)); do modprobe -r hv_netvsc; modprobe hv_netvsc; sleep 10; done. > > and > 2. in vmbus_onoffer_rescind(), we sleep 3s after a subchannel is added into > the primary channel's sc_list (and before the sc_creation_callback is invoked): > (I added line 275) > > 262 if (!fnew) { > 263 /* > 264 * Check to see if this is a sub-channel. > 265 */ > 266 if (newchannel->offermsg.offer.sub_channel_index != 0) { > 267 /* > 268 * Process the sub-channel. > 269 */ > 270 newchannel->primary_channel = channel; > 271 spin_lock_irqsave(&channel->lock, flags); > 272 list_add_tail(&newchannel->sc_list, &channel->sc_list); > 273 channel->num_sc++; > 274 spin_unlock_irqrestore(&channel->lock, flags); > 275 ssleep(3); > 276 } else > 277 goto err_free_chan; > 278 } It is possible to see crashes even without such intrumentation, move CPUs and slower host will do the job. Thanks, -- Vitaly _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel