Hi, On Wed, Jun 19, 2013 at 09:35:38PM +0300, Grygorii Strashko wrote: > On 06/07/2013 11:51 PM, Kevin Hilman wrote: > >Grygorii Strashko <grygorii.strashko@xxxxxx> writes: > > > >>From: Kevin Hilman <khilman@xxxxxxxxxxxxxxxxxxx> > >> > >>Currently, runtime PM is used to keep the device enabled only during > >>active transfers and for a configurable runtime PM autosuspend timout > >>after an xfer. > >> > >>In addition to idling the device, driver's ->runtime_suspend() method > >>currently disables device interrupts when idle. However, on some SoCs > >>(notably OMAP4+), the I2C hardware may shared with other coprocessors. > >>This means that the MPU will still recieve interrupts if a coprocessor > >>is using the I2C device. To avoid this, also disable interrupts at > >>the MPU INTC when idling the device in ->runtime_suspend() (and > >>re-enable them in ->runtime_resume().) This part based on an original > >>patch from Shubhrajyoti Datta. NOTE: for proper sharing the I2C with > >>a coprocessor, this driver still needs hwspinlock support added. > >> > >>More over, this patch fixes the kernel boot failure which happens when > >>CONFIG_SENSORS_LM75=y: > >> > >>[ 2.251220] WARNING: at drivers/bus/omap_l3_noc.c:113 l3_interrupt_handler+0x140/0x184() > >>[ 2.257781] L3 custom error: MASTER:MPU TARGET:L4 PER2 > >>[ 2.264373] Modules linked in: > >>[ 2.268463] CPU: 0 PID: 2 Comm: kthreadd Not tainted 3.10.0-rc4-00034-g042dd60-dirty #64 > >>[ 2.270385] [<c001a80c>] (unwind_backtrace+0x0/0xf0) from [<c0017238>] (show_stack+0x10/0x14) > >>[ 2.286102] [<c0017238>] (show_stack+0x10/0x14) from [<c0040fd0>] (warn_slowpath_common+0x4c/0x68) > >>[ 2.295593] [<c0040fd0>] (warn_slowpath_common+0x4c/0x68) from [<c0041080>] (warn_slowpath_fmt+0x30/0x40) > >>[ 2.299987] [<c0041080>] (warn_slowpath_fmt+0x30/0x40) from [<c02c5ed0>] (l3_interrupt_handler+0x140/0x184) > >>[ 2.315582] [<c02c5ed0>] (l3_interrupt_handler+0x140/0x184) from [<c009ffb8>] (handle_irq_event_percpu+0x58/0x258) > >>[ 2.323364] [<c009ffb8>] (handle_irq_event_percpu+0x58/0x258) from [<c00a01f4>] (handle_irq_event+0x3c/0x5c) > >>[ 2.327819] [<c00a01f4>] (handle_irq_event+0x3c/0x5c) from [<c00a2f7c>] (handle_fasteoi_irq+0xbc/0x164) > >>[ 2.337829] [<c00a2f7c>] (handle_fasteoi_irq+0xbc/0x164) from [<c009f978>] (generic_handle_irq+0x20/0x30) > >>[ 2.357513] [<c009f978>] (generic_handle_irq+0x20/0x30) from [<c0014168>] (handle_IRQ+0x4c/0xac) > >>[ 2.366821] [<c0014168>] (handle_IRQ+0x4c/0xac) from [<c00085b4>] (gic_handle_irq+0x28/0x5c) > >>[ 2.375762] [<c00085b4>] (gic_handle_irq+0x28/0x5c) from [<c04f08a4>] (__irq_svc+0x44/0x5c) > >>[ 2.379821] Exception stack(0xdb085ec0 to 0xdb085f08) > >>[ 2.389953] 5ec0: 00000001 00000001 00000000 db07ea00 40000113 db2317a8 db084000 000002f1 > >>[ 2.389953] 5ee0: db07ea00 00000000 00000000 00000000 c04fd990 db085f08 c009170c c04f03e8 > >>[ 2.405609] 5f00: 20000113 ffffffff > >>[ 2.408355] [<c04f08a4>] (__irq_svc+0x44/0x5c) from [<c04f03e8>] (_raw_spin_unlock_irqrestore+0x34/0x44) > >>[ 2.418304] [<c04f03e8>] (_raw_spin_unlock_irqrestore+0x34/0x44) from [<c00403c0>] (do_fork+0xa4/0x2d4) > >>[ 2.427795] [<c00403c0>] (do_fork+0xa4/0x2d4) from [<c0040620>] (kernel_thread+0x30/0x38) > >>[ 2.437805] [<c0040620>] (kernel_thread+0x30/0x38) from [<c0063888>] (kthreadd+0xd4/0x13c) > >>[ 2.448364] [<c0063888>] (kthreadd+0xd4/0x13c) from [<c0013308>] (ret_from_fork+0x14/0x2c) > >>[ 2.448364] ---[ end trace da8cf92c433d1616 ]--- > >> > >>The root casue of crash is race between omap_i2c_runtime_suspend() > >> and omap_i2c_isr_thread() > >If there's a race here, then it's not the same problem which is > >described above, unless the CPU2 IRQ is happening because of a shared > >user of I2C, in which case it doesn't need any additional explanation. > no shared users here > >>CPU1: CPU2: > >>|-omap_i2c_xfer | > >> |- pm_runtime_put_autosuspend() | > >> |-timeout |-omap_i2c_isr() > >> |-return IRQ_WAKE_THREAD; > >> |-omap_i2c_runtime_suspend() | > >> |-omap_i2c_isr_thread() > >> |- oops is here - I2C module is in idle state > >If this is happening, I don't think it's related to $SUBJECT patch (but > >is probably hidden by it.) > I can remove "fix spurious IRQs: " from $SUBJECT. What do you think? > > > >Instead, what probably needs to happen in this is case is that > >omap_i2c_isr() should be doing a "mark last busy" to reset the > >autosuspend timeout. And, that should be done in a separate patch. this doesn't make sense... mark last busy is done after the I2C message is actually complete, so is pm_runtime_put_autosuspend() which is done following mark_last_busy. autosuspend timer shouldn't be firing since put won't be called until we're dead sure I2C message (all of them, in fact) are finalized. > Yes - from one side. From other side, this patch prevents such > situation to happen. > disable_irq(_dev->irq); - waits for any pending IRQ handlers for this > interrupt > to complete before returning. IRQ subsystem makes sure to disable our IRQ line while our handler is running and, because of IRQF_ONESHOT, IRQ will only be re-enabled after thread has finished. > If we are in .runtime_idle() callback - it means I2C transaction has > been finished we don't even use runtime_idle() and your statement is wrong, btw. If you have 200 pm_runtime_get() followed by 200 pm_runtime_put() (put is called only after 200 gets, no put-get ping-pong), your ->runtime_resume() gets called once, your ->runtime_suspend() gets called once but your ->runtime_idle() will get called 200 times. assuming I2C transaction has finished is wrong here. What if we decide to add pm_runtime_get() on hardirq handler and pm_runtime_put() to the end of the thread handler ? If you mask IRQs on ->runtime_idle(), you might miss some events. > As I mentioned in cover-latter this happens because PM runtime > auto-suspend isn't > working properly during the boot: > > [ 23.190002] omap_i2c 48350000.i2c: ==== i2c_add_numbered_adapter > [ 23.204681] omap_i2c 48350000.i2c: bus 3 rev0.11 at 400 kHz > [ 23.211669] omap_i2c 48350000.i2c: ====== rpm_suspend elapsed 0 last_busy 4294939616 > [ 23.219879] omap_i2c 48350000.i2c: ====== rpm_suspend expires 4294939716 last_busy 4294939616 autosuspend_delay 1000 > [ 23.219879] omap_i2c 48350000.i2c: ====== rpm_suspend expires 4294939700 > [ 23.237274] omap_i2c 48350000.i2c: ====== rpm_suspend mod_timer expires 4294939700 > --- the timer scheduled to 84 jiffes > [ 23.246185] omap_i2c 48350000.i2c: ==== pm_runtime_put_autosuspend > [ 23.253448] omap-dma-engine 4a056000.dma-controller: allocating channel for 62 > [ 23.261138] omap-dma-engine 4a056000.dma-controller: allocating channel for 61 > [ 23.269500] omap_i2c 48070000.i2c: ==== omap_i2c_runtime_resume > [ 23.275817] omap_i2c 48350000.i2c: ==== omap_i2c_runtime_suspend > --- and expired after ~40 ms > > So, there are no guaranty that this race will not happen even if I'll > do "mark last busy" > in HW IRQ handler (, because in high CPU load use case the threaded > IRQ can be delayed. then the bug is on pm_runtime time keeping, no ? Not really on the I2C driver. -- balbi
Attachment:
signature.asc
Description: Digital signature