Re: [PATCH 3/3] pciehp: Fix race condition handling surprise link-down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 07, 2016 at 07:04:33PM -0500, Keith Busch wrote:
> On Wed, Dec 07, 2016 at 05:40:54PM -0600, Bjorn Helgaas wrote:
> > On Sat, Nov 19, 2016 at 12:32:47AM -0800, Ashok Raj wrote:
> > > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > > @@ -182,6 +182,7 @@ static void pciehp_power_thread(struct work_struct *work)
> > >  	switch (info->req) {
> > >  	case DISABLE_REQ:
> > >  		mutex_lock(&p_slot->hotplug_lock);
> > > +		p_slot->state = POWEROFF_STATE;
> > 
> > It sounds right that p_slot->state should be protected.
> > 
> > It looks like handle_button_press_event() and
> > pciehp_sysfs_enable_slot() hold p_slot->lock while updating
> > p_slot->state.
> > 
> > You're setting "state = POWEROFF_STATE" while holding
> > p_slot->hotplug_lock (not p_slot->lock).  Four lines down, we set
> > "state = STATIC_STATE", but this time we're holding p_slot->lock.
> > 
> > What is the difference between the p_slot->lock and
> > p_slot->hotplug_lock?  Do we need both?  How do we know which one to
> > use?
> > 
> > I'm very confused.
> 
> This is _very_ confusing. :)
> 
> The p_slot->hotplug_lock serializes a power off and a power on event. We
> only want to set the state when one of those events gets to execute
> rather than when the event is queued. Changing the state when the event
> is queued can trigger a never ending up-down sequence.
> 
> It currently looks safe to nest the p_slot->lock under the
> p_slot->hotplug_lock if that is you recommendation.

I'm not making a recommendation; that would take a lot more thought
than I've given this.

There are at least three locks in pciehp:

  struct controller.ctrl_lock
  struct slot.lock
  struct slot.hotplug_lock

There shouldn't really be any performance paths in pciehp, so I'm
pretty doubtful that we need such complicated locking.

> Alternatively we could fix this if we used an ordered work queue for
> the slot's work, but that is a significantly more complex change.

You mean we can currently execute things from the work queue in a
different order than they were enqueued?  That sounds ... difficult to
analyze, to say the least.

I don't know much about work queues, and Documentation/workqueue.txt
doesn't mention alloc_ordered_workqueue().  Is that what you are
referring to?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]