On 07/13/2010 09:26 AM, David C. Rankin wrote:
Can anyone think of the possible mechanism that would cause a kernel to
boot once after rebuilding the initramfs, but then be corrupt for every boot
thereafter?? As mentioned in the title on the 2nd boot attempt (and all
subsequent attempts), the boot process either hard-locks when the "Setting up
UTF-8 mode" message is displayed --or-- a kernel NULL Pointer message is
displayed and then I get 3 screens of garbage before the box either locks or a
ctrl+c kills that part of the boot process and booting proceeds until it craters
4-10 steps later.
Could the Null Pointer blow up be due to incorrect gpu handling by the Arch
kernel causing the blow-up when the modules are loaded (about the same time the
KMS magic is taking place?
I say this because I have one of ATI's less common gpu's in this Toshiba laptop.
The video card is:
Radeon X1250 Graphics(690G Chipset), RS690M, RV410 Graphics Core. This uses the
onboard PCIe bus interface and has API support for DirectX 9.0b and OpenGL 2.0.
For some reason the kernel crashes 'smell' like a mishandling of the gpu
subsystem in the 2.6.34 kernels (Note: this is just a 'gut feel', and I can't
point to anything in particular). Of all things that could have changed for the
past 2 kernels, the KMS magic and a possible bug slipping in for this card seems
like one of the likely areas to start looking.
The lspci -vv data for the card are as follows (I have opensuse running at the
moment - thus the fglrx driver is shown):
01:05.0 VGA compatible controller: ATI Technologies Inc RS690M [Radeon X1200
Series] (prog-if 00 [VGA controller])
Subsystem: Toshiba America Info Systems Device ff00
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 64, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at f0000000 (64-bit, prefetchable) [size=128M]
Region 2: Memory at f8100000 (64-bit, non-prefetchable) [size=64K]
Region 4: I/O ports at 9000 [size=256]
Region 5: Memory at f8000000 (32-bit, non-prefetchable) [size=1M]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0
Enable-
Address: 0000000000000000 Data: 0000
Kernel driver in use: fglrx_pci
Kernel modules: fglrx
I don't know why the card is reporting as an X1200 in lspci. The Core Clock for
this gpu is 400 MHz and according to AMD, that means this is the 1250 and not
the 1200 because the Core Clock on the 1200 is 350 MHz.
I don't know what, if any, changes took place in KMS or in gpu initialization
for the 2.6.34 kernel, but this card always sucked when using the ATI driver
which prevented me from moving to Arch sooner on this box. Then with the 2.6.32
& 2.6.33 kernels, it was like somebody turned on a light-switch in the kernel
and I was getting Blazing fast performance out of the xf86-video-ati driver on
Arch, compiz was working great, and the gpu subsystem was working better than
ever before in Arch with just the 'radeon' driver.
When I updated to 2.6.34-2 I ran into the problem with compiz "whitescreening"
and video performance 'tanked' when I had the system running on 'first boot'
which would boot.
Then on every attempt to boot thereafter - the boot would fail and either hang
of blow-up with the kernel NULL Pointer error.
That has me thinking that this problem has to be related to some module
rearrangement/updating that takes place after you boot the box for the first
time -- thus preventing the next boot from working.
I don't know how to verify or check this out, but this is what my gut tells me
is going on.
Arch gurus -- any way to test this hypothesis??
--
David C. Rankin, J.D.,P.E.
Rankin Law Firm, PLLC
510 Ochiltree Street
Nacogdoches, Texas 75961
Telephone: (936) 715-9333
Facsimile: (936) 715-9339
www.rankinlawfirm.com