Re: arm-soc + rmk's tree boot failure on OMAP4430SDP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 16, 2012 at 05:47:06PM -0700, Tony Lindgren wrote:
> Hi,
> 
> Adding Tomi to this thread.
> 
> * Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx> [120316 16:14]:
> > Sometime during the last week, the OMAP4430SDP stopped booting - it now
> > stops with no kernel messages output:
> > 
> > http://www.arm.linux.org.uk/developer/build/result.php?type=boot&idx=69
> > 
> > The previously booted version:
> > 
> > http://www.arm.linux.org.uk/developer/build/result.php?type=boot&idx=57
> > 
> > worked fine - though this log will only be available for about 3 hours.
> > 
> > I've re-checked my tree, and the OMAP4430SDP boots fine, so it's either
> > breakage coming from arm-soc or a result of merging the two trees
> > together.
> 
> Based on initcall_debug with current linux-next, it seems to hang at
> omap_dss_init2. And leaving out CONFIG_OMAP2_DSS makes devices boot
> again.

Well, if I put my printk hack in, then I get:

Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
<6>msgmni has been set to 995
<6>io scheduler noop registered
<6>io scheduler deadline registered
<6>io scheduler cfq registered (default)
<3>INFO: task swapper/0:1 blocked for more than 120 seconds.
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>swapper/0       D<c> c03237b0 <c>    0     1      0 0x00000000
Backtrace: 
[<c03233b4>] (__schedule+0x0/0x4d0) from [<c03239c4>] (schedule+0x74/0x78)
[<c0323950>] (schedule+0x0/0x78) from [<c0322404>] (__mutex_lock_slowpath+0x140/
0x19c)
[<c03222c4>] (__mutex_lock_slowpath+0x0/0x19c) from [<c032248c>] (mutex_lock+0x2
c/0x40)
[<c0322460>] (mutex_lock+0x0/0x40) from [<c01f8f68>] (__driver_attach+0x48/0x90)
 r4:df4f2808
[<c01f8f20>] (__driver_attach+0x0/0x90) from [<c01f77b0>] (bus_for_each_dev+0x58
/0x98)
 r6:c0475634 r5:c01f8f20 r4:00000000
[<c01f7758>] (bus_for_each_dev+0x0/0x98) from [<c01f8c58>] (driver_attach+0x20/0
x28)
 r7:df472780 r6:c0475634 r5:c0475634 r4:c045ea28
[<c01f8c38>] (driver_attach+0x0/0x28) from [<c01f8034>] (bus_add_driver+0xb4/0x2
30)
[<c01f7f80>] (bus_add_driver+0x0/0x230) from [<c01f9614>] (driver_register+0xac/
0x138)
[<c01f9568>] (driver_register+0x0/0x138) from [<c01fa678>] (platform_driver_regi
ster+0x4c/0x60)
 r8:c046c0d8 r7:c04755a4 r6:c04755a4 r5:c045ea30 r4:c045ea28
[<c01fa62c>] (platform_driver_register+0x0/0x60) from [<c01a7ae4>] (dss_init_pla
tform_driver+0x14/0x1c)
[<c01a7ad0>] (dss_init_platform_driver+0x0/0x1c) from [<c01a742c>] (omap_dss_pro
be+0x3c/0x200)
[<c01a73f0>] (omap_dss_probe+0x0/0x200) from [<c01fa2e8>] (platform_drv_probe+0x
20/0x24)
[<c01fa2c8>] (platform_drv_probe+0x0/0x24) from [<c01f8e3c>] (driver_probe_devic
e+0xd0/0x1b4)
[<c01f8d6c>] (driver_probe_device+0x0/0x1b4) from [<c01f8f8c>] (__driver_attach+
0x6c/0x90)
 r7:df443ef0 r6:c04755a4 r5:c045ea64 r4:c045ea30
[<c01f8f20>] (__driver_attach+0x0/0x90) from [<c01f77b0>] (bus_for_each_dev+0x58
/0x98)
 r6:c04755a4 r5:c01f8f20 r4:00000000
[<c01f7758>] (bus_for_each_dev+0x0/0x98) from [<c01f8c58>] (driver_attach+0x20/0
x28)
 r7:df472800 r6:c04755a4 r5:c04755a4 r4:c043e388
[<c01f8c38>] (driver_attach+0x0/0x28) from [<c01f8034>] (bus_add_driver+0xb4/0x2
30)
[<c01f7f80>] (bus_add_driver+0x0/0x230) from [<c01f9614>] (driver_register+0xac/
0x138)
[<c01f9568>] (driver_register+0x0/0x138) from [<c01fa678>] (platform_driver_regi
ster+0x4c/0x60)
 r8:00000000 r7:00000013 r6:c00373ec r5:c043e464 r4:c043e388
[<c01fa62c>] (platform_driver_register+0x0/0x60) from [<c042a0fc>] (omap_dss_ini
t2+0x14/0x1c)
[<c042a0e8>] (omap_dss_init2+0x0/0x1c) from [<c0008770>] (do_one_initcall+0x9c/0
x164)
[<c00086d4>] (do_one_initcall+0x0/0x164) from [<c04122f4>] (kernel_init+0x90/0x1
38)
[<c0412264>] (kernel_init+0x0/0x138) from [<c00373ec>] (do_exit+0x0/0x6c4)
 r5:c0412264 r4:00000000

And the reason is that a platform _driver_ (omapdss_dss) is being
registered while a platform device (omapdss) is being probed.

This is a very bad idea.  There is absolutely no reason to register
drivers from within a probe function - to put it another way, this
code is absolutely insane.

Why?  Because you're destroying the whole idea that drivers only ever
get registered once.  If you happen to have two omapdss devices (okay
that probably won't happen yet) then you'll register those device
structures twice which will cause all hell to break lose.

Moreover - and this is why it's failing - when devices are probed, their
mutex is held.  But not just _their_ mutex, but also their direct parent's
mutex as well.

So, when the omapdss_dss driver is registered while the omapdss device is
being probed, and there's already an omapdss_dss platform device present,
the driver model tries to bind the omapdss_dss platform device with the
newly registered omapdss_dss platform driver.

That binding wants to take the mutex on the omapdss device, but wait,
that's already held by the thread registering the omapdss_dss platform
driver.  Hence, deadlock.

This mess has been created by all those
	"DSS2: xxx: create platform_driver, move init, exit to driver"

commits, and they're all _wrong_ for the above reason.

However, I doubt simply moving the driver registration calls out of the
probe function will be enough - "OMAP: DSS2: Fix init and unit sequence"
hints that there's a dependence in the driver initialization order.
That's another finger pointing at the approach being wrong, because
there is _no_ guarantee as to the order in which drivers or devices are
probed by the driver model.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux