On Friday, October 08, 2010, Kevin Hilman wrote: > "Rafael J. Wysocki" <rjw@xxxxxxx> writes: > > > On Friday, October 08, 2010, Kevin Hilman wrote: > >> "Rafael J. Wysocki" <rjw@xxxxxxx> writes: > >> > >> > On Thursday, October 07, 2010, Alan Stern wrote: > >> >> On Thu, 7 Oct 2010, Kevin Hilman wrote: > >> >> > >> >> > My confusion is not about the use of spinlocks, it's a question of what > >> >> > is being busy-waited for, and the thread that is being waited for is > >> >> > going to complete when interrupts are disabled. > >> >> > > >> >> > Sorry to be dense, but can you (re)summarize what you're proposing as I > >> >> > think I'm getting mixed up with all the various options we've been > >> >> > tossing around. > >> >> > > >> >> > If it can work, I'm certainly in favor of a busy-wait approach as it > >> >> > really ensures that sync requests are handled quickly. > >> >> > >> >> Okay, here's the story in a nutshell. Allowing a subsystem's or > >> >> driver's runtime-PM callbacks to run with interrupts disabled faces two > >> >> obstacles: > >> >> > >> >> (1): We don't want two different CPUs to run callbacks for the > >> >> same device at the same time. So if a callback is already > >> >> running on one CPU (i.e., if the device's runtime status is > >> >> either SUSPENDING or RESUMING) then another CPU can't be > >> >> allowed to invoke a callback. > >> >> > >> >> Thus, you can't do a synchronous pm_runtime_resume_irq() > >> >> if the device is in the middle of a suspend or resume > >> >> operation. We're left with two choices: Fail the synchronous > >> >> call and force the driver to defer matters to a workqueue > >> >> (possibly masking an IRQ line in the meantime), or busy-wait > >> >> until the concurrent operation finishes. > >> >> > >> >> If the PM core simply avoids releasing dev->power.lock before > >> >> invoking the runtime_suspend or runtime_resume callback, the > >> >> end result is almost the same as with busy-waiting. > >> > > >> > This is slightly more complicated, because suspend is a bit different from > >> > resume. I think it's generally acceptable to refuse to do a "fast path" suspend > >> > if the device state is not RPM_ACTIVE at the moment, but refusing to do > >> > a "fast path" resume may be more problematic (more on that later). > >> > > >> >> (2): In general we can't resume a device if its parent is suspended. > >> >> If the parent's runtime_resume routine needs to run with > >> >> interrupts enabled then there's no way to resume the device > >> >> while keeping interrupts disabled. > >> >> > >> >> Possible solutions involve, again, deferring matters to a > >> >> workqueue, or else simply not allowing the situation to arise > >> >> in the first place (forbid a device to have interrupt-disabled > >> >> callbacks unless its parent does too or the parent doesn't use > >> >> runtime PM at all). > >> >> > >> >> In general I'm against the solutions that require a workqueue. > >> > > >> > OK > >> > > >> >> Raphael appears to favor workqueues for (1) and be against them for (2). > >> > > >> > The particular case we need to handle (I think) is that some devices need to be > >> > resumed "atomically" as a part of bringing a CPU package out of a C-like-state > >> > (Kevin, is that correct?). > >> > >> Correct, but there is more than one problematic case. > >> > >> Another problem is in devices that are not atomic with C-states, but > >> receive wakeup IRQs while they're suspended and thus need to > >> pm_runtime_get() from their ISRs, without having an essentially > >> unbounded interrupt latency before the ISR can be handled. > >> > >> Consider the following sequence for a given device > >> > >> - device: ->runtime_suspend() /* registers not accessible */ > >> - pm_idle > >> - IRQs disabled > >> - idle > >> - wakeup event > >> - idle loop finishes > >> - IRQs enabled > >> - device: ->runtime_resume() > >> > >> Now, consider the 'wakeup event' is an IRQ for that device. That means > >> as soon as IRQs are (re)enabled (and before some other activity has > >> triggered a ->runtime_resume), the device's ISR is called. Since it is > >> RPM_SUSPENDED, it's registers are not accessible so it needs to do a > >> pm_runtime_get(). > >> > >> The case that raised this initially was a GPIO IRQ demux, where the > >> initial ISR is just a chained handler whichfigures out which GPIO > >> fired and then call genirq for that IRQ. > >> > >> All that being said, while I've currently described these as two > >> different problems, the second one could be solved by converting it to > >> the first one. IOW, make GPIO be one of those devices that are > >> suspended/resumed atomically with the CPU so that it is guaranteed to be > >> awake when IRQs are re-enabled. > >> > >> While that would work, I've been trying to move in the opposite > >> direction by trying to dis-connect them from CPU idle. > >> > >> > In that case not only we can't defer the resume (by > >> > using a workqueue), but also we have to finish the resume within strict time > >> > constraints (the time to leave the given C-like-state is one of the parameters > >> > used by cpuidle governors to decide which state to choose). > >> > > >> > On the other hand, busy waiting (by looping) in the case the state is > >> > RPM_RESUMING is as though the callbacks were executed under a spinlock and we > >> > happened to wait on it. I'd prefer to avoid that. Moreover, if the state is > >> > RPM_SUSPENDING at the time a "fast path" resume is started, we are almost > >> > guaranteed to violate the time constraints (by busy waiting for the suspend to > >> > complete and carrying out the resume). > >> > > >> > So, here's an idea: > >> > > >> > First, let's introduce two flags in struct dev_pm_info, irq_safe and > >> > fast_resume. The former has to be set so that the things above work. > >> > > >> > Second, let's add a new flag to pass to __pm_runtime_{suspend|resume}(), > >> > RPM_FAST_RESUME such that if it is passed to __pm_runtime_suspend(), it > >> > causes fast_resume to be set for the given device (if the status is already > >> > RPM_SUSPENDED, it just sets the flag, otherwise it suspends the device > >> > normally and then sets the flag). Now, if fast_resume is set, > >> > __pm_runtime_resume() has to be passed RPM_FAST_RESUME, or it will > >> > fail (it also will fail if passed RPM_FAST_RESUME and fast_resume isn't > >> > set). In that case the status has to be RPM_SUSPENDED or RPM_RESUMING > >> > (otherwise fast_resume won't be set) and if it is RPM_RESUMING, this means > >> > that the other resume has been called with RPM_FAST_RESUME too (it would > >> > fail otherwise), so it's fine to busy loop until the status is RPM_SUSPENDED > >> > (it's the caller's problem to avoid that). RPM_FAST_RESUME causes > >> > __pm_runtime_resume() to avod turning interrupts on. > >> > > >> > If necessary, there may be a flag for __pm_runtime_suspend() that will > >> > carry out "atomic" suspend, but I'm not sure if that's necessary. Kevin? > >> > >> I think we'll need both "atomic" suspend & resume. > >> > >> One of the devices I'm currently managing in this atomic fashion (inside > >> pm_idle) is the UARTs (of course, I'm not currently using runtime PM for > >> this, I'm calling platform-specific hooks directly since IRQs are > >> disabled.) For the UARTs I'm using both atomic suspend and resume, > >> primarily for debug so I am sure UARTs are awake to see any panics > >> during the last parts of the suspend/idle process and the early parts of > >> the resume path. This UART case is a bit of a hack until the driver is > >> converted to runtime PM, but illustrates some debug related reasons we > >> might also need atomic suspend and resume. > >> > >> We also have some workarounds for hardware errata that require > >> putting certain devices in a specific state just before (and after) > >> idle. The drivers for these devices will need both suspend and resume. > > > > OK, thanks for the info. > > > > Do you need "normal" resume to work after "atomic" suspend, or is it > > sufficient that "atomic" suspend will require "atomic" resume? > > hmm... while I'm definitely needing an "atomic" resume after a "normal" > suspend, for now I can't think of a case where a "normal" resume would > be needed after an "atomic" suspend. All the cases where I'm currently > using an atomic suspend also have a corresponding atomic resume. > > As I write this, it wouldn't surprise me down the road to find some HW > errata that requires the device in a specific state only before idle, > but not caring about the state after idle. That would be a case where > an atomic suspend would be needed, but the resume would be "normal" > sometime later when the device is next needed. Still, I think it wouldn't hurt if "fast resume" is used in that case. I'm trying to figure out if atomic suspend should always imply atomic resume. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html