Re: [PATCH v3 03/10] usb: find internal hub tier mismatch via acpi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 21, 2014 at 2:05 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 21 Jan 2014, Dan Williams wrote:
>
>> > > We want to:
>> > >
>> > > 1/ prevent khubd from running while reset is in progress
>> >
>> > This is the open question mentioned above, if you add in the
>> > requirement that we also want to prevent a reset from starting while
>> > khubd is running (except when khubd performs the reset itself).
>> >
>> > > 2/ prevent khubd from seeing an intermediate device state during
>> > > usb_port_resume
>> >
>> > khubd doesn't pay all that much attention to device states; mostly it
>> > is concerned with port statuses.  We don't want khubd to see an
>> > intermediate port status during usb_port_resume.  Basically that means
>> > we don't want khubd to run while usb_port_resume is in progress, and we
>> > don't want usb_port_resume to start while khubd is running unless khubd
>> > performs the resume itself.
>>
>> Yes, it does not pay all that much attention to device states, but it is
>> critical that it not read ->state while usb_port_resume() is changing
>> it.  Specifically this hole in hub_port_connect_change():
>
> It doesn't matter.  We don't want khubd and usb_port_resume to run at
> the same time, for other reasons, and therefore we don't have to worry
> about khubd seeing an intermediate device state caused by
> usb_port_resume.

I won't argue this anymore.  Protecting khubd against portstatus
changes makes the device state moot.

[..]
>> > > All these scenarios to point to problems with ->device_state collisions
>> > > not necessarily the port state.
>> >
>> > I don't think so, for the reason mentioned above.
>>
>> I'll rephrase, checking USB_STATE_SUSPENDED needs usb_port_resume()
>> synchronization.
>
> Which we need anyway, so the device_state aspect doesn't matter.  (You
> may have overlooked the port-status tests in hub_handle_remote_wakeup.)

No, I considered it.

I think we only need protection vs usb_port_{suspend|resume} in three cases

1/ hub_handle_remote_wakeup() where we check if the link/device is suspended

2/ hub_connect_change() where we check for connection

3/ in hub_events() where we check if warm reset is required as
usb_port_resume may also want to perform warm resets

->state and portstatus are intertwined in these checks, but I can see
your point that it really is more about portstatus.  Sychronizing
->state comes along for the ride.

>> I propose we use usb_port pm runtime synchronization to block khubd when
>> we know the portstatus is not valid (powered off),
>
> Suppose, when you power the port back on, you find that the device is
> no longer plugged in.  How do you prevent the port from being powered
> off again before khubd has a chance to finalize the device?  I guess
> you would have to do a pm_runtime_get on the port whenever the port's
> bit in hub->change_bits gets set.

Simple, take a pm reference in usb_port_runtime_resume and flag the
port as "needs child resume".  Drop the reference once the child
resume completes.  The port is !active until it returns from
usb_port_runtime_resume, and khubd performs a pm barrier

>>  and we need a new
>> lock to synchronize with usb_device pm operations before checking the
>> device state.
>
> Forget about the device state.  It's a distraction.

Done.

>> I think we agree that khubd needs to not look at the portstatus while
>> the port is down.  pm runtime synchronization takes care of that and
>> prevents khubd from starting while the port is inactive.  With that in
>> place we can now make requests in usb_port_runtime_resume() that are
>> later serviced in khubd when the port is marked active (pm runtime
>> enforces that ->runtime_resume() return 0 before the port is marked
>> active).
>
> How do you know, when you make the request, that khubd won't start
> running while the port is still down?  If that happened then khubd
> would know to avoid looking at the port, of course, but the request
> would be used up and lost.

Per above it requires khubd to barrier pm operations to finish
bringing the port up.  It does this while holding a reference to
ensure the port does not go down while checking.  I also expect a
reference to be taken when the request is queued and dropped when
khubd services it.

> With a lock, this problem doesn't arise.

A lock for this scenario would effectively duplicate dev->power.lock,
or be wider than necessary if I understand where you are considering
holding it.

>>  This mechanism can be used for forcing at least one child
>> resume for each port resume, and rate-limiting port power requests.
>
> What's the connection with rate-limiting?  Under what circumstances
> would you need to rate-limit the port power changes?

Only testing scenarios where the port is turning on and off at a
potentially high frequency.  Forcing child resume limits that period
to the latency of device recovery.

>> For the next case, fixing khubd vs device_state changes, we can't use
>> pm_runtime synchronization because khubd is expected to be able to
>> resume a suspended usb_device.  So we need to prevent khubd from
>> evaluating the device_state (USB_STATE_SUSPENDED) while usb_device pm
>> operations are in flight.  In the case of dpm_{suspend|resume} khubd is
>> frozen so there's no collision, in the case of rpm_{suspend|resume} we
>> need to take a lock.
>
> Agreed.
>
>> Let's wrap usb_runtime_{resume|suspend} in a new usb_device->state_lock
>> and then take that lock in khubd whenever it needs to check ->state or
>> portstatus values associated with suspend.  Specifically:
>>
>> @@ -1765,10 +1774,14 @@ int usb_runtime_resume(struct device *dev)
>>         struct usb_device       *udev = to_usb_device(dev);
>>         int                     status;
>>
>> -       /* Runtime resume for a USB device means resuming both the device
>> -        * and all its interfaces.
>> +       /* Runtime resume for a USB device means resuming both the
>> +        * device and all its interfaces.  We lock to synchronize the
>> +        * device state with khubd
>>          */
>> +       mutex_lock(&udev->state_lock);
>>         status = usb_resume_both(udev, PMSG_AUTO_RESUME);
>> +       mutex_unlock(&udev->state_lock);
>> +
>>         return status;
>>  }
>>
>> The lock hangs off the usb_device rather than the the usb_port since it
>> is meant to prevent unintended disconnects.
>
> In principle, the new lock could go in either the device or the port.
> IMO it makes more sense to put the lock in the port, because it is
> meant to protect against port-status collisions.
>

Ok, no functional difference.

>>  Any portstatus vs khubd
>> collisions are handled by pm runtime synchronization since there is no
>> expectation that khubd resumes a usb_port.
>
> What?!  I don't understand this at all.

Vocab disagreement.  I was still differentiating "portstatus
collisions" as static conditions where khubd should not run at all on
a port because that port is off vs "state collisions" where khubd
needs synchronize with usb_port_resume and dynamic changes.

That's the last time I reference "state collisions", I promise.

We can use pm sync for the static portstatus collisions and the lock
for the dynamic portstatus collisions.  The "request khubd take
action" mechanism is used when transitioning out of the static
condition.

> The synchronization you were
> talking about applies to usb_port_runtime_resume; it doesn't protect
> khubd from colliding with port-status changes in usb_port_resume.
> That's why we need the new lock.

Yes, still in agreement.

> Also, I don't think we should take the tal.  Once the port is out of the
> suspended state and the status is all cleaned up, the protection is no
> longer needed.  There's no reason to keep holding the lock while we
> invoke the resume routines in all the interface drivers.  (For that
> matter, there's no reason to hold the lock at all when resuming a root
> hub.)

Ok, we can push it down further.  I considered it could be as coarse
as the device lock, but once the usb_device is resumed you are right
it has served its purpose.

> It's important to establish the proper order of locking.  In an earlier
> email, I pointed out that we will sometimes need to acquire the new
> lock while holding the device lock.  This remains true.

Yes, we always need to go device_lock then 'new_lock'.  Deadlock
avoidance 101.  I rewrote libsas/libata error handling synchronization
and suspend/resume support, so I'm aware of how hairy these things can
get.

>> Once khubd has made a determination that it wants to resume or reset a
>> device it will of course need to drop the state lock.  Those actions
>> will implicitly take the state lock.
>
> Let's call it the status lock, not the state lock, because what it
> protects is the port status.

"status_lock" it is.

>> Here's the patch (to replace patch 9), now tested.  There are of course
>> cleanups that can follow this, but holding off until we have agreement
>> on closing these races.
>
> You're doing it again!  I just told you not to post patches while
> discussing overall strategy.

I respect the point, and agree that some more consensus before code
would have been beneficial in places.  I also did not see wide
disagreement in the proposed strategy and thought it worth my time to
actually test out a draft (it's not as if we are talking a major
overhaul).  Certainly would have been less provactive to say "I gave
it a shot and seems to work here" instead of including the patch, but
I honestly did not expect that we still had significant distance in
our positions.  I do appreciate the discussion.  Forgive me if I am
trying your patience.

That said, I do believe you are still skeptical of pm runtime
synchronization.  We could certainly take the lock when
setting/clearing ->power_is_on, and disable khubd from processing
!power_is_on ports.  However, that solution needs a way to flush a
currently running khubd, or prevent a port from suspending while khubd
is active (here comes pm runtime sync again)

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux