Dexuan Cui <decui@xxxxxxxxxxxxx> writes: >> -----Original Message----- >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] >> Sent: Wednesday, February 4, 2015 1:01 AM >> To: KY Srinivasan; devel@xxxxxxxxxxxxxxxxxxxxxx >> Cc: Haiyang Zhang; linux-kernel@xxxxxxxxxxxxxxx; Dexuan Cui; Jason Wang >> Subject: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for >> vmbus channels >> >> free_channel() function frees the channel unconditionally so we need to make >> sure nobody has any link to it. This is not trivial and there are several >> examples of races we have: >> >> 1) In vmbus_onoffer_rescind() we check for channel existence with >> relid2channel() and then use it. This can go wrong if we're in the middle >> of channel removal (free_channel() was already called). >> >> 2) In process_chn_event() we check for channel existence with >> pcpu_relid2channel() and then use it. This can also go wrong. >> >> 3) vmbus_free_channels() just frees all channels, in case we're in the middle >> of vmbus_process_rescind_offer() crash is possible. >> >> The issue can be solved by holding vmbus_connection.channel_lock everywhere, >> however, it looks like a way to deadlocks and performance degradation. Get/put >> workflow fits here the best. >> >> Implement vmbus_get_channel()/vmbus_put_channel() pair instead of >> free_channel(). >> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> >> --- >> drivers/hv/channel_mgmt.c | 45 >> ++++++++++++++++++++++++++++++++++++++------- >> drivers/hv/connection.c | 7 +++++-- >> drivers/hv/hyperv_vmbus.h | 4 ++++ >> include/linux/hyperv.h | 13 +++++++++++++ >> 4 files changed, 60 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c >> index 36bacc7..eb9ce94 100644 >> --- a/drivers/hv/channel_mgmt.c >> +++ b/drivers/hv/channel_mgmt.c >> @@ -147,6 +147,8 @@ static struct vmbus_channel *alloc_channel(void) >> return NULL; >> >> channel->id = atomic_inc_return(&chan_num); >> + atomic_set(&channel->count, 1); >> + >> spin_lock_init(&channel->inbound_lock); >> spin_lock_init(&channel->lock); >> >> @@ -178,19 +180,47 @@ static void release_channel(struct work_struct *work) >> } >> >> /* >> - * free_channel - Release the resources used by the vmbus channel object >> + * vmbus_put_channel - Decrease the channel usage counter and release the >> + * resources when this counter reaches zero. >> */ >> -static void free_channel(struct vmbus_channel *channel) >> +void vmbus_put_channel(struct vmbus_channel *channel) >> { >> + unsigned long flags; >> >> /* >> * We have to release the channel's workqueue/thread in the vmbus's >> * workqueue/thread context >> * ie we can't destroy ourselves. >> */ >> - INIT_WORK(&channel->work, release_channel); >> - queue_work(vmbus_connection.work_queue, &channel->work); >> + spin_lock_irqsave(&channel->lock, flags); >> + if (atomic_dec_and_test(&channel->count)) { >> + channel->dying = true; >> + INIT_WORK(&channel->work, release_channel); >> + spin_unlock_irqrestore(&channel->lock, flags); >> + queue_work(vmbus_connection.work_queue, &channel->work); >> + } else >> + spin_unlock_irqrestore(&channel->lock, flags); >> +} >> +EXPORT_SYMBOL_GPL(vmbus_put_channel); >> + >> +/* vmbus_get_channel - Get additional reference to the channel */ >> +struct vmbus_channel *vmbus_get_channel(struct vmbus_channel *channel) >> +{ >> + unsigned long flags; >> + struct vmbus_channel *ret = NULL; >> + >> + if (!channel) >> + return NULL; >> + >> + spin_lock_irqsave(&channel->lock, flags); >> + if (!channel->dying) { >> + atomic_inc(&channel->count); >> + ret = channel; >> + } >> + spin_unlock_irqrestore(&channel->lock, flags); >> + return ret; >> } >> +EXPORT_SYMBOL_GPL(vmbus_get_channel); >> >> static void percpu_channel_enq(void *arg) >> { >> @@ -253,7 +283,7 @@ static void vmbus_process_rescind_offer(struct >> work_struct *work) >> list_del(&channel->sc_list); >> spin_unlock_irqrestore(&primary_channel->lock, flags); >> } >> - free_channel(channel); >> + vmbus_put_channel(channel); >> } >> >> void vmbus_free_channels(void) >> @@ -262,7 +292,7 @@ void vmbus_free_channels(void) >> >> list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) { >> vmbus_device_unregister(channel->device_obj); >> - free_channel(channel); >> + vmbus_put_channel(channel); >> } >> } >> >> @@ -391,7 +421,7 @@ done_init_rescind: >> spin_unlock_irqrestore(&newchannel->lock, flags); >> return; >> err_free_chan: >> - free_channel(newchannel); >> + vmbus_put_channel(newchannel); >> } >> >> enum { >> @@ -549,6 +579,7 @@ static void vmbus_onoffer_rescind(struct >> vmbus_channel_message_header *hdr) >> queue_work(channel->controlwq, &channel->work); >> >> spin_unlock_irqrestore(&channel->lock, flags); >> + vmbus_put_channel(channel); >> } >> >> /* >> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c >> index c4acd1c..d1ce134 100644 >> --- a/drivers/hv/connection.c >> +++ b/drivers/hv/connection.c >> @@ -247,7 +247,8 @@ void vmbus_disconnect(void) >> * Map the given relid to the corresponding channel based on the >> * per-cpu list of channels that have been affinitized to this CPU. >> * This will be used in the channel callback path as we can do this >> - * mapping in a lock-free fashion. >> + * mapping in a lock-free fashion. Takes additional reference to the >> + * channel, all users are supposed to do vmbus_put_channel(). >> */ >> static struct vmbus_channel *pcpu_relid2channel(u32 relid) >> { >> @@ -263,7 +264,7 @@ static struct vmbus_channel *pcpu_relid2channel(u32 >> relid) >> } >> } >> >> - return found_channel; >> + return vmbus_get_channel(found_channel); >> } >> >> /* >> @@ -297,6 +298,7 @@ struct vmbus_channel *relid2channel(u32 relid) >> } >> } >> } >> + found_channel = vmbus_get_channel(found_channel); >> spin_unlock_irqrestore(&vmbus_connection.channel_lock, flags); >> >> return found_channel; >> @@ -360,6 +362,7 @@ static void process_chn_event(u32 relid) >> pr_err("no channel callback for relid - %u\n", relid); >> } >> >> + vmbus_put_channel(channel); >> } >> >> /* >> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h >> index b055e53..40d70f0 100644 >> --- a/drivers/hv/hyperv_vmbus.h >> +++ b/drivers/hv/hyperv_vmbus.h >> @@ -687,6 +687,10 @@ void vmbus_device_unregister(struct hv_device >> *device_obj); >> /* VmbusChildDeviceDestroy( */ >> /* struct hv_device *); */ >> >> +/* >> + * Get the channel by its relid. Takes additional reference to the channel so >> + * all users are supposed to do vmbus_put_channel() when they're done. >> + */ >> struct vmbus_channel *relid2channel(u32 relid); >> >> void vmbus_free_channels(void); >> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h >> index e73cfeb..c576d2d 100644 >> --- a/include/linux/hyperv.h >> +++ b/include/linux/hyperv.h >> @@ -649,6 +649,9 @@ struct vmbus_channel { >> /* Unique channel id */ >> int id; >> >> + /* Active reference count */ >> + atomic_t count; >> + >> struct list_head listentry; >> >> struct hv_device *device_obj; >> @@ -666,6 +669,7 @@ struct vmbus_channel { >> u8 monitor_bit; >> >> bool rescind; /* got rescind msg */ >> + bool dying; /* channel is dying */ >> >> u32 ringbuffer_gpadlhandle; >> >> @@ -907,6 +911,15 @@ extern int vmbus_recvpacket_raw(struct >> vmbus_channel *channel, >> >> extern void vmbus_ontimer(unsigned long data); >> >> +/* >> + * Decrease reference count for the channel. Frees the channel when its usage >> + * count reaches zero. >> + */ >> +extern void vmbus_put_channel(struct vmbus_channel *channel); >> + >> +/* Get additional reference to the channel */ >> +extern struct vmbus_channel *vmbus_get_channel(struct vmbus_channel >> *channel); >> + >> /* Base driver object */ >> struct hv_driver { >> const char *name; >> -- > > Hi Vitaly, > Thanks for the patchset! > I once tried to do the same work, but just didn't find enough time. :-) > > Please see my below questions: Thanks for your detailed review! I should have added 'RFC' to my series :-) > > Since we always get channel->lock when accessing channel->count, I don't > think the counter has to be of atomic_t? Agreed > > I suggest we add vmbus_get/put_channel() in vmbus_open/close(). > In this way, IMO patch 3/4 and 4/4 would be unnecessary? > I was thinking about that but I wasn't sure it is going to be enough for sub-channels. I'll look again and report. > In vmbus_exit(), I suspect we should swap the order of the 2 lines: > vmbus_free_channels(); > bus_unregister(&hv_bus); > I suppose in bus_unregister(), the .remove callbacks of all the drivers are > invoked, and vmbus_close() is invoked for every channel. Makes sense! > > BTW, I just noticed: alloc_channel() -> alloc_workqueue(,,max_active==1,): > due to max_active==1 here, a channel's process_offer() and > process_rescind_offer() is already serialized. > I'm not sure if this fact can be used to simplify the logic? Workers are serialized but e.g. vmbus_onoffer_rescind() is not serialized with them, e.g. there is no protection against channel dessapearance after we found it with relid2channel() (e.g. we're on failure path in vmbus_process_offer()). Anyway, we have another examples of (theoretically) possible issues and get/put workflow should simplify the syncronization. > Is the previous commit "Drivers: hv: vmbus: serialize Offer and Rescind offer" > necessary? It is as not checking work function is bad anyway as (in theory) we can run vmbus_process_offer() twice if vmbus_onoffer_rescind() runs before vmbus_process_offer() was started. > > BTW2, I think there is an unsafe race in the 2 paths (this is an existing issue, > not related to this patch of yours): > > vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel -> > list_for_each_entry(channel, pcpu_head, percpu_list) > > and > > vmbus_process_rescind_offer() -> percpu_channel_deq() -> > list_del(&channel->percpu_list) > > because the former is running in tasklet and hence can preempt the latter (running > in process context). percpu_channel_enq() should have the same > issue(?) As I said in my cover letter I don't believe my series solves all issues :-) > > We should add local_bh_disable()/enable() accordingly? > Can you please help to fix this? Yes, sure, I'll take a look. > > vmbus_exit() -> vmbus_free_channels() -> vmbus_put_channel() may have > a race with vmbus_process_rescind_offer()? > After vmbus_process_rescind_offer() -> vmbus_device_unregister(), the count > can already be the initial 1, later vmbus_free_channels() -> vmbus_put_channel() can > kfree the channel before vmbus_process_rescind_offer() completely > finishes? I was going to rewrite vmbus_free_channels() not only because of that. E.g. vmbus_get_outgoing_channel() presumes all lists (in both channel.sc_list and vmbus_connection.chn_list) are valid and we don't take care of that. Not that I believe vmbus_get_outgoing_channel() may happen when vmbus_free_channels() is active (and we call vmbus_free_channels() only if we unload the module) Thanks for your thorough review, it gave me further inspiration :-) -- Vitaly _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel