Thanks for that Andrew, I'll try the latest CVS version in a while, I'm currently having trouble getting 2.6.13 booting at the moment... I'll have a look to see if your modification are ok once I've got that booting. Back to the other IRQ problem... I've done some more testing. I've now tested it on 2 servers which have similar, but not exactly the same, hardware. It occurs on our HP Proliant ML350 G4 and our HP Proliant ML370 G4. The problem only seems to occur on bootup from a poweroff. If I reboot, it then starts working correctly until I power it off. 2 different print outs to /var/log/messages depending on if the card has a CI or not. With: Jul 13 16:27:34 bloodhound kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue Jul 13 16:27:34 bloodhound kernel: You probably have a hardware problem with your RAM chips Jul 13 16:27:34 bloodhound kernel: irq 11: nobody cared! Jul 13 16:27:34 bloodhound kernel: [<c014efc4>] __report_bad_irq+0x24/0x7d Jul 13 16:27:34 bloodhound kernel: [<c014f0a6>] note_interrupt+0x6b/0x89 Jul 13 16:27:34 bloodhound kernel: [<c014e5c1>] __do_IRQ+0x180/0x2ff Jul 13 16:27:34 bloodhound kernel: [<c011b6d2>] scheduler_tick+0x1a/0x4d4 Jul 13 16:27:34 bloodhound kernel: [<c0105928>] do_IRQ+0x69/0x85 Jul 13 16:27:34 bloodhound kernel: [<c0103aea>] common_interrupt+0x1a/0x20 Jul 13 16:27:34 bloodhound kernel: [<c014e407>] handle_IRQ_event+0x28/0x62 Jul 13 16:27:34 bloodhound kernel: [<c014e4ff>] __do_IRQ+0xbe/0x2ff Jul 13 16:27:34 bloodhound kernel: [<c010590b>] do_IRQ+0x4c/0x85 Jul 13 16:27:34 bloodhound kernel: ======================= Jul 13 16:27:34 bloodhound kernel: [<c012c6e0>] process_timeout+0x0/0x5 Jul 13 16:27:34 bloodhound kernel: [<c0103aea>] common_interrupt+0x1a/0x20 Jul 13 16:27:34 bloodhound kernel: [<c012797c>] __do_softirq+0x2c/0x8a Jul 13 16:27:34 bloodhound kernel: [<c0105a17>] do_softirq+0x39/0x40 Jul 13 16:27:34 bloodhound kernel: ======================= Jul 13 16:27:34 bloodhound kernel: [<c0105912>] do_IRQ+0x53/0x85 Jul 13 16:27:34 bloodhound kernel: [<c0103aea>] common_interrupt+0x1a/0x20 Jul 13 16:27:34 bloodhound kernel: [<c0112592>] delay_pmtmr+0xb/0x13 Jul 13 16:27:34 bloodhound kernel: [<c0208779>] __delay+0x9/0xa Jul 13 16:27:34 bloodhound kernel: [<e08fc1cc>] start_ts_capture+0x14a/0x280 [budget_core] Jul 13 16:27:34 bloodhound kernel: [<e08fd475>] ttpci_budget_set_video_port+0xfd/0x1a8 [budget_core] Jul 13 16:27:34 bloodhound kernel: [<c011bbcd>] __wake_up_common+0x35/0x55 Jul 13 16:27:34 bloodhound kernel: [<e091a4c8>] ciintf_slot_shutdown+0x2d/0x31 [budget_ci] Jul 13 16:27:34 bloodhound kernel: [<e09a6d9b>] dvb_ca_en50221_slot_shutdown+0x5d/0xf5 [dvb_core] Jul 13 16:27:34 bloodhound kernel: [<e09a776a>] dvb_ca_en50221_io_do_ioctl+0x116/0x151 [dvb_core] Jul 13 16:27:34 bloodhound kernel: [<e099f5d8>] dvb_usercopy+0x93/0x102 [dvb_core] Jul 13 16:27:34 bloodhound kernel: [<e099f0b3>] dvb_device_open+0x54/0xe8 [dvb_core] Jul 13 16:27:34 bloodhound kernel: [<c0182c51>] chrdev_open+0x14c/0x36b Jul 13 16:27:34 bloodhound kernel: [<c01e9ec4>] file_alloc_security+0x29/0x7d Jul 13 16:27:34 bloodhound kernel: [<c0176019>] dentry_open+0xaf/0x1a5 Jul 13 16:27:34 bloodhound kernel: [<e09a77a5>] dvb_ca_en50221_io_ioctl+0x0/0x1d [dvb_core] Jul 13 16:27:34 bloodhound kernel: [<e09a77bd>] dvb_ca_en50221_io_ioctl+0x18/0x1d [dvb_core] Jul 13 16:27:34 bloodhound kernel: [<e09a7654>] dvb_ca_en50221_io_do_ioctl+0x0/0x151 [dvb_core] Jul 13 16:27:34 bloodhound kernel: [<c018de89>] do_ioctl+0x39/0x52 Jul 13 16:27:34 bloodhound kernel: [<c018df97>] vfs_ioctl+0x55/0x193 Jul 13 16:27:34 bloodhound kernel: [<c018e134>] sys_ioctl+0x5f/0x6f Jul 13 16:27:34 bloodhound kernel: [<c010392d>] syscall_call+0x7/0xb Jul 13 16:27:34 bloodhound kernel: handlers: Jul 13 16:27:34 bloodhound kernel: [<e08c70af>] (interrupt_hw+0x0/0x345 [saa7146]) Jul 13 16:27:34 bloodhound kernel: [<c02c4d70>] (usb_hcd_irq+0x0/0x57) Jul 13 16:27:35 bloodhound kernel: [<c02c4d70>] (usb_hcd_irq+0x0/0x57) Jul 13 16:27:35 bloodhound kernel: Disabling IRQ #11 Jul 13 16:27:39 bloodhound kernel: dvb_ca adaptor 0: PC card did not respond :( Without: Jul 15 14:08:48 bloodhound kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue Jul 15 14:08:48 bloodhound kernel: You probably have a hardware problem with your RAM chips Message from syslogd@bloodhound at Fri Jul 15 14:08:49 2005 ... bloodhound kernel: Disabling IRQ #11 Jul 15 14:08:48 bloodhound kernel: irq 11: nobody cared! Jul 15 14:08:48 bloodhound kernel: [<c0153cf4>] __report_bad_irq+0x24/0x80 Jul 15 14:08:48 bloodhound kernel: [<c0153e0e>] note_interrupt+0x8e/0xb0 Jul 15 14:08:48 bloodhound kernel: [<c015325d>] __do_IRQ+0x20d/0x320 Jul 15 14:08:48 bloodhound kernel: [<c011e32a>] scheduler_tick+0x1a/0x4f0 Jul 15 14:08:48 bloodhound kernel: [<c010599b>] do_IRQ+0x9b/0xa0 Jul 15 14:08:48 bloodhound kernel: [<c0103c16>] common_interrupt+0x1a/0x20 Jul 15 14:08:48 bloodhound kernel: [<c0153009>] handle_IRQ_event+0x29/0x70 Jul 15 14:08:48 bloodhound kernel: [<c015314c>] __do_IRQ+0xfc/0x320 Jul 15 14:08:48 bloodhound kernel: [<c0105958>] do_IRQ+0x58/0xa0 Jul 15 14:08:48 bloodhound kernel: ======================= Jul 15 14:08:48 bloodhound kernel: [<c0103c16>] common_interrupt+0x1a/0x20 Jul 15 14:08:48 bloodhound kernel: [<c012b38e>] __do_softirq+0x2e/0xa0 Jul 15 14:08:48 bloodhound kernel: [<c0105a81>] do_softirq+0x41/0x50 Jul 15 14:08:48 bloodhound kernel: ======================= Jul 15 14:08:48 bloodhound kernel: [<c010595f>] do_IRQ+0x5f/0xa0 Jul 15 14:08:48 bloodhound kernel: [<c0105277>] do_nmi+0x47/0x60 Jul 15 14:08:48 bloodhound kernel: [<c0103c16>] common_interrupt+0x1a/0x20 Jul 15 14:08:48 bloodhound kernel: handlers: Jul 15 14:08:48 bloodhound kernel: [<e08af240>] (interrupt_hw+0x0/0x370 [saa7146]) Jul 15 14:08:48 bloodhound kernel: [<e08af240>] (interrupt_hw+0x0/0x370 [saa7146]) Jul 15 14:08:48 bloodhound kernel: [<c02f0a40>] (usb_hcd_irq+0x0/0x70) Jul 15 14:08:49 bloodhound kernel: [<c02f0a40>] (usb_hcd_irq+0x0/0x70) Jul 15 14:08:49 bloodhound kernel: Disabling IRQ #11 So Far we've tested... Nexus-S = Works Nova-S = Problem Occurs Nova-S CI = Works Nexus-S & Nova-S CI = Problem Occurs Nexus-S & Nova-S CI & Nova-S = Problem Occurs Nova-S & Nova-S CI = Problem Occurs Nova-S & Nova-S = Problem Occurs Note: Nova-S is a different card without the CI connector not just a Nova-S CI with the CI unplugged. If I watch /proc/interrupts when I start vlc the NMI turns to 1 and then IRQ 11 increases at a huge rate. The "Disabling IRQ 11" occurs roughly 2 seconds after the NMI is generated. Looking at the kernel source an IRQ is disabled like this if more that 99,900 interrupts out of 100,000 are not handled. This is the output on the ML370... CPU0 0: 715364 XT-PIC timer 1: 530 XT-PIC i8042 2: 0 XT-PIC cascade 8: 1 XT-PIC rtc 9: 1 XT-PIC acpi 10: 11805 XT-PIC ioc0, ioc1, saa7146 (0), saa7146 (1), uhci_hcd:usb3, uhci_hcd:usb4 11: 3900000 XT-PIC ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb5, eth0 12: 101 XT-PIC i8042 14: 5826 XT-PIC ide0 NMI: 1 ERR: 2 The ML350 has the 2 saa7146 cards on IRQ 11 instead... I've also tried using VLC displaying the picture on the server and thus not using any other devices and I get the error again. Does anyone have any ideas what it could be or where I should look next? Thanks alot! On 7/23/05, Andrew de Quincey <adq_dvb@xxxxxxxxxxxxx> wrote: > On Friday 15 July 2005 08:54, Michael Ditum wrote: > > We've managed to fix the issue and the card is now running quite > > happily without any messages going to /var/log/messages > > > > We only modified a couple of lines, basically we comment out the line > > that says it supports IRQ's so the polling section is reached and then > > instead of directly calling the dvb_ca_en50221_read_data function we > > wake the thread that calls it. > > > > I've attached the patch to the latest CVS drivers. > > Hi, thanks for the patch. However I decided to implement it in a different way > so that the IRQ mode is still preserved. I've converted it to use read/write > spinlocks instead of semaphores since they are usable from within an > interrupt context. Let me know if this causes any problems and I'll fix them. >