On Fri, 27 Jan 2012 15:58:40 -0700 (MST) Paul Walmsley <paul@xxxxxxxxx> wrote: > On Sat, 28 Jan 2012, NeilBrown wrote: > > > Here's a theory: perhaps the MPU powerdomain is hitting a low-power state > > > while waiting for an HDQ interrupt. When the MPU powerdomain is in a low > > > power state, so is the MPU interrupt controller, so the only way that the > > > MPU can wake up is if the HDQ can issue a wakeup event to the PRCM. And I > > > don't see any evidence that the HDQ is capable of doing this, based on the > > > HDQ sections of the TRM. What a huge energy waste, if true. > > > > In the config I am testing MPU only goes to RET, never OFF. Same for CORE. > > That qualifies as a low-power state for the MPU :-) > > In MPU RET, at least in theory, the MPU INTC will not be operational, and > thus unable to respond to interrupts. > > > > Maybe try something like the following patch -- compile-tested only here. > > > > > > If this works, you might want to try dropping this patch and using the pad > > > mux to set a wakeup event on the 1-wire pad when the signal goes low. > > > That might be a more power-efficient approach. You may still have to use > > > some PM QoS request there to ensure that the HDQ can wake up fast enough > > > to see the pulse, but the constraint shouldn't need to be as ludicrously > > > low as it is in the following patch. > > > > Doesn't work - crashes :-( > > > > pm_qos_power_init is defined as a late_initcall, so you cannot call > > pm_qos_update_request until after all the initcalls have run. > > But with your patch, the probe of the bq27000 triggers a read of the registers > > which tries to update_request - and it goes 'bang'. > > Hmm, okay. Could you please remove the register read from the HDQ probe > and see if that makes a difference? Sorry, I misdiagnosed. The problem was that the pm_qos_request structure needs to be zeroed before pm_qos_add_request() or it complains, doesn't initialise it, and future calls can crash. So with 'kzalloc' in place of 'kmalloc' where hdq_data is allocated, the qos patch does make my problem go away. It feels a bit heavy-handed though. > > > I really think the problem is the CORE pwrdm gating a clock because no module > > says it needs it - i.e. nothing to do with MPU at all. > > Until pm_runtime_put*() is called, the usecount of hdq_fck will still be > non-zero. So the CORE shouldn't be able to gate it or hdq_ick at that > time, and thus should not be able to enter idle. Hence the question about > where the problems occur: whether they occur in the middle of the > transaction or when the HDQ clocks are disabled. > > > We want to keep CORE active when an HDQ transaction is happening, but MPU is > > welcome to go to sleep. I don't think you can express that with 'qos'. I > > think it needs some omap-specific machinery. > > The OMAP PRCM hardware should keep the CORE* clkdms active when the > hdq_fck is enabled. So it's possible there could be a PRCM silicon bug > that doesn't take hdq_fck into account when determining whether the CORE_* > clkdms are inactive. That isn't the way I understand the TRM and the code. I admit there is a lot that I don't understand, but I've really drilled down in this area and my understanding aligns with what I see experimentally, so.... clkdm_allow_idle(struct clockdomain *clkdm) tells the hardware to allow that clkdm to go idle, and it does so with no reference to whether any clk in the clkdm is current enabled. This seems to be called conditional only on which IDLE state is being entered. Any state below C1 is entered with clkdm_allow_idle in force. When the hardware has been told that it can idle a clockdomain, it will handshake with each module that uses the clockdomain. i.e. it doesn't check if the clock is enabled, it asks the module directly. The module answers according to the SIDLE (slave idle) setting. This can be set to - always say "no, don't idle, I'm busy" - always say "yes, force me idle, I don't care" - 'smartidle' where the response is based on the internal state of the module. A good example of smartidle is the UART. If it hasn't seen a character in or out for about 10 (I think) character times it will say "yes, idle the clock", else it will say "no". When RX then goes low it will handshake back to say "Hey, I'm busy all of a sudden, can I have the clock back please" which it then gets in time to count off the start bit and then read the rest of the incoming byte.(*) However HDQ doesn't have SIDLE. So the PRCM doesn't handshake with it. It just assumes that HDQ will say "yes, force me idle, I don't care". The diversion about UARTs is relevant because when the UARTs have SIDLE set to 'no' I don't see the problem (they keep the clock domain active). It is only when the UARTs set their SIDLE to smartidle that the problem appears. So: if the HDQ driver wants the clkdm to stay on it has to explicitly arrange it in software by appropriate clkdm_deny_idle() calls. From the "System Power Management and Wakeup" section of the HDQ chapter of the TRM: As part of the system-wide power-management scheme, the HDQ/1-Wire module can go into idle state at the request of the PRCM module (for more information, see Chapter 3, Power, Reset, and Clock Management). However, the HDQ/1-Wire module does not support handshake protocol with the PRCM. and Software must ensure correct clock management. It also mentions the AUTO_HDQ bit of PRCM.CM_AUTOIDLE1_CORE. This had me confused for a while, but I think it only affects the iclk. I don't think the iclk is the problem. It is the fclk that is disappearing because the clkdm that feeds it is being turned off. > > > I can 'fix' the problem simply by making sure > > > > pwrdm_for_each_clkdm(core_pd, _cpuidle_deny_idle); > > > > runs in omap3_enter_idle whenever HDQ is active. > > Hmm, that does suggest that it's not wakeup related. > > > One of the reasons that I think it is a clock problem rather than just > > missing a wakeup event is that once the problem starts happening I > > cannot recovery without rebooting. i.e. even if I tell the UARTs to keep > > the clocks on permanently and keep the CPUIDLE state at 0, the HDQ > > doesn't start working again. It has clearly become confused. The HDQ > > doco makes a point of saying that you shouldn't issue any commands > > (except 'enable clock') when the clock is disabled. I think we end up > > doing that and it gets confused and cannot recover. > > > > I note that there is an ad-hoc dependency between the camera and various > > power states as well. Maybe we need a little bit of infrastructure so > > that camera can say "Keep CORE and MPU on" (or whatever it needs) and > > HDQ can say "Keep CORE on". ??? > > I'm not familiar with the camera problems, but in the HDQ case, this > should only be needed if a silicon bug exists. Which is certainly > possible; we've seen this problem with one other IP block in the past. > Based on a quick glance at the errata, I don't see anything related to the > HDQ, but that doesn't really mean anything. > > In any case, we should be able to work around this via the hwmod layer and > a special flag, if the problem really is a PRCM bug. This will depend on > the functional powerstate conversion. My idea of a fix would be a secondary reference count on each clkdm. When a clk that doesn't have an associated SIDLE setting is activated we increase this reference count (and decrease it when the clk is deactivated). While the count is non-zero we deny_idle for that clkdm. Or something like that. Thanks for listening :-) NeilBrown > > Thanks for the detailed test reports. > > - Paul (*) Another note about UARTs. In 3.2 (haven't checked 3.3) the driver not only sets SMARTIDLE but also stops the clocks when things are idle. So when RX goes low and the UART wakes us up it doesn't get a clock straight away but we have to wait for code to enable the clock. In my experiments, this is fast enough to stay in-sync for speeds up to 38400 (I think). Above there we miss the first bit and get corrupt characters. I think that if the UART is expecting data it should not disable the clock but should use SMARTIDLE to let it turn off, and then turn back on again quickly.... but maybe 3.3 gets that right. NB
Attachment:
signature.asc
Description: PGP signature