In message <1227322200.3602.17.camel@xxxxxxxxxxxxxxxxxx>, Andy Walls wrote: hi. >On Fri, 2008-11-21 at 15:11 -0600, Vanessa Ezekowitz wrote: >> I'm not sure whose 'department' this is, so I'm sending this email to >> the v4l/dvb lists... >> >> About a week ago, my machine started locking up randomly. Eventually >> figured out the problem and ended up replacing my dead primary SATA >> disk with a couple of older IDE disks. A reinstall of Ubuntu Hardy, >> and a couple of days of the usual setup and personalizing tweaks >> later, my system is back up and running. >> >> There is still one other SATA disk in my system and it is behaving >> normally. While adding the replacement disks, I moved it to the port >> formerly occupied by the dead disk. >> >> My system, for reasons beyond my understanding, insists on sharing >> IRQ's among the various PCI devices, despite my explicit settings in >> the BIOS to assign fixed IRQ's to my PCI slots. One of those IRQ's >> is being shared between my capture card and SATA controller. >> Normally, this would not be an issue, but I seem to have found a nasty >> bug in the cx88xx driver. > >I'm not so sure about that (see below), but you have found a nasty >problem with your system I think. > > >> Without trying to use my capture card at all, every time I access the >> other SATA disk in my system, the cx88 driver spits out a HORRENDOUS >> number of weird messages, filling my system logs so fast that after >> two days, I'd used over 6 GB just in the few logs that sysklogd >> generates. > >Every time the sata controller generates an interrupt, the kernel calls >the IRQ handler routines sharing that interrupt. Your cx88 driver >*always* thinks it has interrupts to service - but it actually doesn't. > >Note how many of the dumped registers are '0xffffffff' including the >'irq aud' interrupt status register: > >> Nov 21 01:59:12 rainbird kernel: cx88[0]: irq aud [0xffffffff] dn_risci1* up_risci1* rds_dn_risc1* 3* dn_risci2* up_risci2* rds_dn_risc2* 7* dnf_of* u >pf_uf* rds_dnf_uf* 11* dn_sync* up_sync* rds_dn_sync* 15* opc_err* par_err* rip_err* pci_abort* ber_irq* mchg_irq* 22* 23* 24* 25* 26* 27* 28* 29* 30* 3 >1* > >A 0xffffffff is likely not a real interrupt status, but a PCI bus read >error, for which the PCI-PCI bridge or Host-PCI bridge returns the all >ones value. The interrupt handler sees every possible interrupt that it >could be interested in as having occurred and likely tries to process >them. The further PCI MMIO accesses are also failing as evinced by all >the 0xffffffff values being dumped. This is a good point. Elsewhere I have seen exactly this kind of all ones error within an IRQ when pulling a running cardbus device from a system. The recommendation to remove the card and reseat it would appear to be a sensible 1st step. cya -- // / {:)==={ Darron Broad <darron@xxxxxxxx> \\ \ _______________________________________________ linux-dvb mailing list linux-dvb@xxxxxxxxxxx http://www.linuxtv.org/cgi-bin/mailman/listinfo/linux-dvb