On Saturday 22 November 2008, Michael Buesch wrote: > On Saturday 22 November 2008 16:32:08 Larry Finger wrote: > > Michael Buesch wrote: > > > Somebody disabled MMIO and busmastering. > > > And somebody cleared the CACHE_LINE_SIZE register. > > > > Are these all the read/write bits in the configuration area? Should I > > conclude that someone zeroed this area? > > Yeah well. I'm not sure. It _looks_ like someone completely cut the > physical power line to the card and it reset its complete PCI config. > So well, X does poke with the PCI devices. But as you said it also happens > if X doesn't run, I'd rule that out. > But I would not rule out a fucked BIOS, yet. > Does the BIOS have any powersave options and/or spread-spectrum options for > the PCI-bus? Can you try to turn them all off? > I have a machine that has PCI-slot autodetect and turns of the PCI clock, > if it doesn't detect a card on that slot. Also turn that off, if you have > it, too. > > > In case the kernel memory diagnostics don't help, is there any way to > > trap writes to the configuration registers? > > Well, if we have random memory corruption, that can hit memory and MMIO. > It doesn't hurt to turn on all debugging options. Often you get some hint > by doing so. I've enabled all CONFIG*DEBUG I could find relevant, and ran the system with: 'debug memory_corruption_check=1 devres.log=1 debug_objects debugpat acpi.debug_layer=0x00410002 acpi.debug_level=0xffffffff' but no hint appears in the logs during the failure. I did find that certain events recreate the problem immediately. if I 'xset dpms force standby' it happens on wakeup. 'xset -dpms' causes this immediately as well. If I load X without DPMS support, it still happens after the monitor is waken up from (hardware?) blackness. --yuval
Attachment:
signature.asc
Description: This is a digitally signed message part.