> From: Kimberly Brown <kimbrownkd@xxxxxxxxx> > Sent: Monday, January 21, 2019 10:43 PM > > > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct > > > kobject *kobj, > > > if (chan->state != CHANNEL_OPENED_STATE) > > > return -EINVAL; > > > > > > - return attribute->show(chan, buf); > > > + mutex_lock(&vmbus_connection.channel_mutex); > > > + ret = attribute->show(chan, buf); > > > + mutex_unlock(&vmbus_connection.channel_mutex); > > > + return ret; > > > } > > > > It looks this patch is already able to fix the original issue: > > 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"), > > as chan->state can't be CHANNEL_OPENED_STATE when > > channel->ringbuffer_page is not set up yet, or has been freed. > > -- Dexuan > > I think that patch 6712cc9c2211 fixes the problem when the channel is > not set up yet, but it doesn't fix the problem when the channel is being > closed Yeah, now I see your point. > The channel could be freed after the check that "chan->state" is > CHANNEL_OPENED_STATE, while the "attribute->show()" function is running. IMO the channel struct itself can't be freed while the "attribute->show()" function is running, because I suppose the sysfs interface should have an extra reference to channel->kobj (see vmbus_add_channel_kobj()), so before the sysfs files are removed, the channel struct itself can't disappear. (I didn't check the related code very carefully, so I could be wrong. :-) ) But as you pointed, at least for sub-channels, channel->ringbuffer_page can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and the "attribute->show()" could crash when the race happens. Adding channel_mutex here seems to be able to fix the race for sub-channels, as the same channel_mutex is used in vmbus_disconnect_ring(). For a primary channel, vmbus_close() -> vmbus_free_ring() can still free channel->ringbuffer_page without notifying the "attribute->show()". We may also need to acquire/release the channel_mutex in vmbus_close()? > Actually, there should be checks that "chan" is not null and that > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll > need to fix that. I suppose "chan" can not be NULL here (see the above). Checking "chan->state" may not help to completely fix the race, because there is no locking/synchronization code in vmbus_close_internal() when we test and change "channel->state". I guess we may need to check if channel->ringbuffer_page is NULL in the "attribute->show()". PS, to prove that a race condition does exist and can really cause a panic or something, I usually add some msleep() delays in different paths so that I can reproduce the crash every time I take a special action, e.g. when I read the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove a patch can indeed help, at least it can fix the crash which would happen without the patch. :-) -- Dexuan _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel