> From: Kimberly Brown <kimbrownkd@xxxxxxxxx> > > ... > > But as you pointed, at least for sub-channels, channel->ringbuffer_page > > can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and > > the "attribute->show()" could crash when the race happens. > > Adding channel_mutex here seems to be able to fix the race for > > sub-channels, as the same channel_mutex is used in > vmbus_disconnect_ring(). > > > > For a primary channel, vmbus_close() -> vmbus_free_ring() can still > > free channel->ringbuffer_page without notifying the "attribute->show()". > > We may also need to acquire/release the channel_mutex in vmbus_close()? > > > > > Actually, there should be checks that "chan" is not null and that > > > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll > > > need to fix that. > > I suppose "chan" can not be NULL here (see the above). > > > > Checking "chan->state" may not help to completely fix the race, because > > there is no locking/synchronization code in > > vmbus_close_internal() when we test and change "channel->state". > > > > The calls to vmbus_close_internal() for the subchannels and the primary > channel are protected with channel_mutex in vmbus_disconnect_ring(). > This prevents "channel->state" from changing while "attribute->show()" is > running. Ah, I think you're right. > > I guess we may need to check if channel->ringbuffer_page is NULL in > > the "attribute->show()". > > > > For the primary channel, vmbus_free_ring() is called after the > return from vmbus_disconnect_ring(). Therefore, the primary channel's > state is changed before "channel->ringbuffer_page" is set to NULL. > Checking the channel state should be sufficient to prevent the ring > buffers from being freed while "attribute->show()" is running. The > ring buffers can't be freed until the channel's state is changed, and > the channel state change is protected by the mutex. I think you're right (I noticed in a previous mail you mentioned you would improve your patch to check "chan->state" with the mutax held). > I think checking that "channel->ringbuffer_page" is not NULL would > also work, but, as you stated, we would need to aquire/release > channel_mutex in vmbus_close(). Then it looks unnecessary to check "channel->ringbuffer_page". > > PS, to prove that a race condition does exist and can really cause a panic or > > something, I usually add some msleep() delays in different paths so that I > > can reproduce the crash every time I take a special action, e.g. when I read > > the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove > > a patch can indeed help, at least it can fix the crash which would happen > > without the patch. :-) > > > > Thanks! I was able to free the ring buffers while "attribute->show()" > was running, which caused a null pointer dereference bug. As expected, > the mutex lock fixed it. Awesome! -- Dexuan _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel