> -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] > Sent: Wednesday, February 4, 2015 17:32 PM > To: Dexuan Cui > Cc: KY Srinivasan; devel@xxxxxxxxxxxxxxxxxxxxxx; Haiyang Zhang; linux- > kernel@xxxxxxxxxxxxxxx; Jason Wang > Subject: Re: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow > for vmbus channels > > Dexuan Cui <decui@xxxxxxxxxxxxx> writes: > > >> -----Original Message----- > >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] > >> Sent: Wednesday, February 4, 2015 1:01 AM > >> To: KY Srinivasan; devel@xxxxxxxxxxxxxxxxxxxxxx > >> Cc: Haiyang Zhang; linux-kernel@xxxxxxxxxxxxxxx; Dexuan Cui; Jason Wang > >> Subject: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow > for > >> vmbus channels > >> > >> free_channel() function frees the channel unconditionally so we need to > make > >> sure nobody has any link to it. This is not trivial and there are several > >> examples of races we have: > >> > >> 1) In vmbus_onoffer_rescind() we check for channel existence with > >> relid2channel() and then use it. This can go wrong if we're in the middle > >> of channel removal (free_channel() was already called). > >> > >> 2) In process_chn_event() we check for channel existence with > >> pcpu_relid2channel() and then use it. This can also go wrong. > >> > >> 3) vmbus_free_channels() just frees all channels, in case we're in the middle > >> of vmbus_process_rescind_offer() crash is possible. > >> > >> The issue can be solved by holding vmbus_connection.channel_lock > everywhere, > >> however, it looks like a way to deadlocks and performance degradation. > Get/put > >> workflow fits here the best. > >> > >> Implement vmbus_get_channel()/vmbus_put_channel() pair instead of > >> free_channel(). > >> > >> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > >> --- > >> drivers/hv/channel_mgmt.c | 45 > >> ++++++++++++++++++++++++++++++++++++++------- > >> drivers/hv/connection.c | 7 +++++-- > >> drivers/hv/hyperv_vmbus.h | 4 ++++ > >> include/linux/hyperv.h | 13 +++++++++++++ > >> 4 files changed, 60 insertions(+), 9 deletions(-) > >> > >> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c > >> index 36bacc7..eb9ce94 100644 > >> --- a/drivers/hv/channel_mgmt.c > >> +++ b/drivers/hv/channel_mgmt.c > >> @@ -147,6 +147,8 @@ static struct vmbus_channel *alloc_channel(void) > >> return NULL; > >> > >> channel->id = atomic_inc_return(&chan_num); > >> + atomic_set(&channel->count, 1); > >> + > >> spin_lock_init(&channel->inbound_lock); > >> spin_lock_init(&channel->lock); > >> > >> @@ -178,19 +180,47 @@ static void release_channel(struct work_struct > *work) > >> } > >> > >> /* > >> - * free_channel - Release the resources used by the vmbus channel object > >> + * vmbus_put_channel - Decrease the channel usage counter and release > the > >> + * resources when this counter reaches zero. > >> */ > >> -static void free_channel(struct vmbus_channel *channel) > >> +void vmbus_put_channel(struct vmbus_channel *channel) > >> { > >> + unsigned long flags; > >> > >> /* > >> * We have to release the channel's workqueue/thread in the vmbus's > >> * workqueue/thread context > >> * ie we can't destroy ourselves. > >> */ > >> - INIT_WORK(&channel->work, release_channel); > >> - queue_work(vmbus_connection.work_queue, &channel->work); > >> + spin_lock_irqsave(&channel->lock, flags); > >> + if (atomic_dec_and_test(&channel->count)) { > >> + channel->dying = true; > >> + INIT_WORK(&channel->work, release_channel); > >> + spin_unlock_irqrestore(&channel->lock, flags); > >> + queue_work(vmbus_connection.work_queue, &channel->work); > >> + } else > >> + spin_unlock_irqrestore(&channel->lock, flags); > >> +} > >> +EXPORT_SYMBOL_GPL(vmbus_put_channel); > >> + > >> +/* vmbus_get_channel - Get additional reference to the channel */ > >> +struct vmbus_channel *vmbus_get_channel(struct vmbus_channel *channel) > >> +{ > >> + unsigned long flags; > >> + struct vmbus_channel *ret = NULL; > >> + > >> + if (!channel) > >> + return NULL; > >> + > >> + spin_lock_irqsave(&channel->lock, flags); > >> + if (!channel->dying) { > >> + atomic_inc(&channel->count); > >> + ret = channel; > >> + } > >> + spin_unlock_irqrestore(&channel->lock, flags); > >> + return ret; > >> } > >> +EXPORT_SYMBOL_GPL(vmbus_get_channel); > >> > >> static void percpu_channel_enq(void *arg) > >> { > >> @@ -253,7 +283,7 @@ static void vmbus_process_rescind_offer(struct > >> work_struct *work) > >> list_del(&channel->sc_list); > >> spin_unlock_irqrestore(&primary_channel->lock, flags); > >> } > >> - free_channel(channel); > >> + vmbus_put_channel(channel); > >> } > >> > >> void vmbus_free_channels(void) > >> @@ -262,7 +292,7 @@ void vmbus_free_channels(void) > >> > >> list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) { > >> vmbus_device_unregister(channel->device_obj); > >> - free_channel(channel); > >> + vmbus_put_channel(channel); > >> } > >> } > >> > >> @@ -391,7 +421,7 @@ done_init_rescind: > >> spin_unlock_irqrestore(&newchannel->lock, flags); > >> return; > >> err_free_chan: > >> - free_channel(newchannel); > >> + vmbus_put_channel(newchannel); > >> } > >> > >> enum { > >> @@ -549,6 +579,7 @@ static void vmbus_onoffer_rescind(struct > >> vmbus_channel_message_header *hdr) > >> queue_work(channel->controlwq, &channel->work); > >> > >> spin_unlock_irqrestore(&channel->lock, flags); > >> + vmbus_put_channel(channel); > >> } > >> > >> /* > >> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c > >> index c4acd1c..d1ce134 100644 > >> --- a/drivers/hv/connection.c > >> +++ b/drivers/hv/connection.c > >> @@ -247,7 +247,8 @@ void vmbus_disconnect(void) > >> * Map the given relid to the corresponding channel based on the > >> * per-cpu list of channels that have been affinitized to this CPU. > >> * This will be used in the channel callback path as we can do this > >> - * mapping in a lock-free fashion. > >> + * mapping in a lock-free fashion. Takes additional reference to the > >> + * channel, all users are supposed to do vmbus_put_channel(). > >> */ > >> static struct vmbus_channel *pcpu_relid2channel(u32 relid) > >> { > >> @@ -263,7 +264,7 @@ static struct vmbus_channel > *pcpu_relid2channel(u32 > >> relid) > >> } > >> } > >> > >> - return found_channel; > >> + return vmbus_get_channel(found_channel); > >> } > >> > >> /* > >> @@ -297,6 +298,7 @@ struct vmbus_channel *relid2channel(u32 relid) > >> } > >> } > >> } > >> + found_channel = vmbus_get_channel(found_channel); > >> spin_unlock_irqrestore(&vmbus_connection.channel_lock, flags); > >> > >> return found_channel; > >> @@ -360,6 +362,7 @@ static void process_chn_event(u32 relid) > >> pr_err("no channel callback for relid - %u\n", relid); > >> } > >> > >> + vmbus_put_channel(channel); > >> } > >> > >> /* > >> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h > >> index b055e53..40d70f0 100644 > >> --- a/drivers/hv/hyperv_vmbus.h > >> +++ b/drivers/hv/hyperv_vmbus.h > >> @@ -687,6 +687,10 @@ void vmbus_device_unregister(struct hv_device > >> *device_obj); > >> /* VmbusChildDeviceDestroy( */ > >> /* struct hv_device *); */ > >> > >> +/* > >> + * Get the channel by its relid. Takes additional reference to the channel so > >> + * all users are supposed to do vmbus_put_channel() when they're done. > >> + */ > >> struct vmbus_channel *relid2channel(u32 relid); > >> > >> void vmbus_free_channels(void); > >> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h > >> index e73cfeb..c576d2d 100644 > >> --- a/include/linux/hyperv.h > >> +++ b/include/linux/hyperv.h > >> @@ -649,6 +649,9 @@ struct vmbus_channel { > >> /* Unique channel id */ > >> int id; > >> > >> + /* Active reference count */ > >> + atomic_t count; > >> + > >> struct list_head listentry; > >> > >> struct hv_device *device_obj; > >> @@ -666,6 +669,7 @@ struct vmbus_channel { > >> u8 monitor_bit; > >> > >> bool rescind; /* got rescind msg */ > >> + bool dying; /* channel is dying */ > >> > >> u32 ringbuffer_gpadlhandle; > >> > >> @@ -907,6 +911,15 @@ extern int vmbus_recvpacket_raw(struct > >> vmbus_channel *channel, > >> > >> extern void vmbus_ontimer(unsigned long data); > >> > >> +/* > >> + * Decrease reference count for the channel. Frees the channel when its > usage > >> + * count reaches zero. > >> + */ > >> +extern void vmbus_put_channel(struct vmbus_channel *channel); > >> + > >> +/* Get additional reference to the channel */ > >> +extern struct vmbus_channel *vmbus_get_channel(struct vmbus_channel > >> *channel); > >> + > >> /* Base driver object */ > >> struct hv_driver { > >> const char *name; > >> -- > > > > Hi Vitaly, > > Thanks for the patchset! > > I once tried to do the same work, but just didn't find enough time. :-) > > > > Please see my below questions: > > Thanks for your detailed review! I should have added 'RFC' to my series :-) > > > > > Since we always get channel->lock when accessing channel->count, I don't > > think the counter has to be of atomic_t? > > Agreed > > > > > I suggest we add vmbus_get/put_channel() in vmbus_open/close(). > > In this way, IMO patch 3/4 and 4/4 would be unnecessary? > > > > I was thinking about that but I wasn't sure it is going to be enough for > sub-channels. I'll look again and report. > > > In vmbus_exit(), I suspect we should swap the order of the 2 lines: > > vmbus_free_channels(); > > bus_unregister(&hv_bus); > > I suppose in bus_unregister(), the .remove callbacks of all the drivers are > > invoked, and vmbus_close() is invoked for every channel. > > Makes sense! > > > > > BTW, I just noticed: alloc_channel() -> alloc_workqueue(,,max_active==1,): > > due to max_active==1 here, a channel's process_offer() and > > process_rescind_offer() is already serialized. > > I'm not sure if this fact can be used to simplify the logic? > > Workers are serialized but e.g. vmbus_onoffer_rescind() is not > serialized with them, e.g. there is no protection against channel > dessapearance after we found it with relid2channel() (e.g. we're on > failure path in vmbus_process_offer()). > > Anyway, we have another examples of (theoretically) possible issues and > get/put workflow should simplify the syncronization. > > > Is the previous commit "Drivers: hv: vmbus: serialize Offer and Rescind offer" > > necessary? > > It is as not checking work function is bad anyway as (in theory) we can > run vmbus_process_offer() twice if vmbus_onoffer_rescind() runs before > vmbus_process_offer() was started. > > > > > BTW2, I think there is an unsafe race in the 2 paths (this is an existing issue, > > not related to this patch of yours): > > > > vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel -> > > list_for_each_entry(channel, pcpu_head, percpu_list) > > > > and > > > > vmbus_process_rescind_offer() -> percpu_channel_deq() -> > > list_del(&channel->percpu_list) > > > > because the former is running in tasklet and hence can preempt the latter > (running > > in process context). percpu_channel_enq() should have the same > > issue(?) > > As I said in my cover letter I don't believe my series solves all issues :-) > > > > > We should add local_bh_disable()/enable() accordingly? > > Can you please help to fix this? > > Yes, sure, I'll take a look. > > > > > vmbus_exit() -> vmbus_free_channels() -> vmbus_put_channel() may have > > a race with vmbus_process_rescind_offer()? > > After vmbus_process_rescind_offer() -> vmbus_device_unregister(), the count > > can already be the initial 1, later vmbus_free_channels() -> > vmbus_put_channel() can > > kfree the channel before vmbus_process_rescind_offer() completely > > finishes? > > I was going to rewrite vmbus_free_channels() not only because of > that. E.g. vmbus_get_outgoing_channel() presumes all lists (in both > channel.sc_list and vmbus_connection.chn_list) are valid and we don't > take care of that. Not that I believe vmbus_get_outgoing_channel() > may happen when vmbus_free_channels() is active (and we call > vmbus_free_channels() only if we unload the module) > > > Thanks for your thorough review, it gave me further inspiration :-) > > -- > Vitaly Thanks, Vitaly! I know it's tricky here. :-) Thanks, -- Dexuan _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel