On Fri, Sep 11, 2009 at 7:09 AM, Dan McGee <dpmcgee@xxxxxxxxx> wrote: > On Thu, Sep 10, 2009 at 6:33 PM, Dan McGee <dpmcgee@xxxxxxxxx> wrote: >> On Thu, Sep 10, 2009 at 10:38 AM, Tobias Powalowski <t.powa@xxxxxx> wrote: >>> Hi guys, >>> kernel 2.6.31 first test run ... >> >> Looking decent here. Noticed a few things: >> >> * new dmesg messages, not sure if they are of concern or not: >> >> ACPI: CPU0 (power states: C1[C1] C2[C2]) >> processor LNXCPU:00: registered as cooling_device0 >> ACPI: Processor [CPU0] (supports 8 throttling states) >> ACPI: SSDT 00000000cfee8a00 00152 (v01 PmRef Cpu1Ist 00003000 INTL 20040311) >> ACPI Error (psparse-0537): Method parse/execution failed >> [\_PR_.CPU1._PDC] (Node ffff88022f81e120), AE_ALREADY_EXISTS >> ACPI: Marking method _PDC as Serialized because of AE_ALREADY_EXISTS error >> ACPI: CPU1 (power states: C1[C1] C2[C2]) >> processor LNXCPU:01: registered as cooling_device1 >> ACPI: Processor [CPU1] (supports 8 throttling states) >> ACPI: SSDT 00000000cfee8b60 00152 (v01 PmRef Cpu2Ist 00003000 INTL 20040311) >> ACPI Error (psparse-0537): Method parse/execution failed >> [\_PR_.CPU2._PDC] (Node ffff88022f81e1a0), AE_ALREADY_EXISTS >> ACPI: Marking method _PDC as Serialized because of AE_ALREADY_EXISTS error >> ACPI: CPU2 (power states: C1[C1] C2[C2]) >> processor LNXCPU:02: registered as cooling_device2 >> ACPI: Processor [CPU2] (supports 8 throttling states) >> ACPI: SSDT 00000000cfee8cc0 00152 (v01 PmRef Cpu3Ist 00003000 INTL 20040311) >> ACPI Error (psparse-0537): Method parse/execution failed >> [\_PR_.CPU3._PDC] (Node ffff88022f81e220), AE_ALREADY_EXISTS >> ACPI: Marking method _PDC as Serialized because of AE_ALREADY_EXISTS error >> ACPI: CPU3 (power states: C1[C1] C2[C2]) >> processor LNXCPU:03: registered as cooling_device3 >> ACPI: Processor [CPU3] (supports 8 throttling states) >> >> * When /etc/rc.d/microcode/ ran in my daemons, it spit out a >> "/etc/rc.d/microcode: /dev/cpu/microcode not a character device" >> message. Interestingly enough it still looks like it ran the microcode >> update as there were messages in dmesg. However, if I run it now it is >> just fine (and that device does exist). Race condition somewhere? > > Failboat when I woke up this morning. Machine (X) was completely > unresponsive, and I ssh-ed in and a bunch of things were all jacked > up. Grabbed something useful out of dmesg though: > > [drm] wait for fifo failed status : 0xE57004A4 0x00FF0F02 > <Above message was in there 240 times> > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<ffffffffa0569131>] radeon_read_ring_rptr+0x31/0x70 [radeon] > PGD 2112ea067 PUD 2112a6067 PMD 0 > Oops: 0000 [#1] PREEMPT SMP > last sysfs file: > /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/scsi_level > CPU 0 > Modules linked in: radeon drm nfs lockd fscache nfs_acl auth_rpcgss > sunrpc coretemp cpufreq_ondemand it87 hwmon_vid ipv6 ipt_REJECT > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack > iptable_filter ip_tables x_tables microcode ext3 jbd usbhid hid > snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device > snd_pcm_oss snd_mixer_oss snd_hda_codec_atihdmi snd_hda_codec_realtek > uhci_hcd snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd > soundcore snd_page_alloc ohci1394 ieee1394 ehci_hcd usbcore i2c_i801 > i2c_core sg iTCO_wdt iTCO_vendor_support r8169 mii intel_agp evdev > thermal fan button battery ac acpi_cpufreq freq_table processor > rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 raid1 md_mod sr_mod > cdrom sd_mod ata_generic ahci pata_jmicron pata_acpi libata scsi_mod > Pid: 3694, comm: X Not tainted 2.6.31-ARCH #1 EP45-DS3R > RIP: 0010:[<ffffffffa0569131>] [<ffffffffa0569131>] > radeon_read_ring_rptr+0x31/0x70 [radeon] > RSP: 0018:ffff88022b597b98 EFLAGS: 00010246 > RAX: ffff88022b7ba180 RBX: ffff88022e681800 RCX: 000000000000002c > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88022e681800 > RBP: 0000000000000010 R08: 00000000ffffffff R09: 000014f4e884a645 > R10: 0000000000000001 R11: ffff880028047958 R12: 0000000000000008 > R13: ffff88022c08aa30 R14: ffff88022ea14900 R15: 0000000000000000 > FS: 00007fda900666f0(0000) GS:ffff880028034000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000211188000 CR4: 00000000000406f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process X (pid: 3694, threadinfo ffff88022b596000, task ffff88022e96dcc0) > Stack: > ffff88022c08aa30 00000000643c49d9 ffff88022e0a6f00 ffffffffa0569c43 > <0> 00000000ffffffff 00000000643c49d9 ffff88022e681800 ffffffffa057e470 > <0> ffff88022e685000 00000000643c49d9 ffff88022c08aa30 ffff88022e681800 > Call Trace: > [<ffffffffa0569c43>] ? radeon_commit_ring+0x63/0xe0 [radeon] > [<ffffffffa057e470>] ? r600_do_cp_idle+0xd0/0x140 [radeon] > [<ffffffffa056d6a6>] ? radeon_do_release+0x76/0x240 [radeon] > [<ffffffffa053d4b1>] ? drm_lastclose+0x51/0x330 [drm] > [<ffffffff81120fc5>] ? __fput+0xe5/0x240 > [<ffffffff8111caa7>] ? filp_close+0x67/0xb0 > [<ffffffff8105bd75>] ? put_files_struct+0x85/0x120 > [<ffffffff8105da9c>] ? do_exit+0x16c/0x7d0 > [<ffffffff8104d4b0>] ? finish_task_switch+0x180/0x190 > [<ffffffff8105e156>] ? do_group_exit+0x56/0xd0 > [<ffffffff8106d641>] ? get_signal_to_deliver+0x2a1/0x470 > [<ffffffff8100b793>] ? do_notify_resume+0x123/0x830 > [<ffffffff811316f9>] ? vfs_ioctl+0xa9/0xd0 > [<ffffffff81131880>] ? do_vfs_ioctl+0xa0/0x5a0 > [<ffffffff8101843e>] ? restore_i387_xstate+0x18e/0x1f0 > [<ffffffff8100c47b>] ? sysret_signal+0x7e/0xcf > Code: 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 f6 87 d6 03 00 00 08 > 75 33 48 8b 87 10 01 00 00 c1 ee 02 89 f6 48 c1 e6 02 48 03 70 18 <8b> > 06 48 8b 54 24 08 65 48 33 14 25 28 00 00 00 75 1e 48 83 c4 > RIP [<ffffffffa0569131>] radeon_read_ring_rptr+0x31/0x70 [radeon] > RSP <ffff88022b597b98> > CR2: 0000000000000000 > ---[ end trace 1cad1c27957ccafb ]--- > Fixing recursive fault but reboot is needed! > > No binary modules, no taint. Haven't searched around yet to see if > anyone else is seeing this. > > -Dan > Found my oops at kerneloops, but no idea where to go with it: http://www.kerneloops.org/guilty.php?guilty=radeon_read_ring_rptr&version=2.6.31-release&start=2064384&end=2097151&class=oops