Firstly...hope everyone enjoyed (or is still enjoying) their holidays. Just joined the mailing list. Apologies on the longish post, but I wanted to provide as much information as possible. I've got a couple of Cobalt boxes (a RaQ2 and a Qube2) that are successfully running Debian (as well as Gentoo) running the 2.6.9 branch of CVS plus Peter Horton's patches. The machines are quite stable and I really haven't run into any issues until I start stress-testing networking. Under high network loads when connected at 100TX FDX, I can always get the Cobalts to lock up. Under anything slower -- 100TX HDX, 10TX FDX, 10TX HDX -- everything works fine. This problem is specific to: 1. Lots of data transfer to/from the Cobalt. 2. Running at 100TX FDX. The test scenario is: Cobalt running 2.6.9CVS + PH's patches. Cobalt is running an NFS server. >From another machine on the network (also running at 100TX FDX), mount the NFS export and copy something huge (in my case a directory that has about 2GB worth of files in it). It'll usually get anywhere from 300MB to 800MB of data before the Cobalt just locks up -- no kernel panic, just a hard lockup that necessitates cycling power manually. At first I thought it might be NFS, so I tried something way less kernel dependent like FTP and still had the same problems. I also recompiled the kernel with: CONFIG_TULIP_MWI=n CONFIG_TULIP_MMIO=n CONFIG_TULIP_NAPI=n CONFIG_TULIP_NAPI_HW_MITIGATION=n as well as set to yes to no avail. As a last resort I turned on lots of debugging output (I set tulip_debug to 99) and finally I got something usable from the kernel: eth0: MII status 782d, Link partner report 45e1. eth0: 21143 negotiation status 000000c6, MII. Badness in local_bh_enable at kernel/softirq.c:141 Call Trace: [<800b32c8>] [<80084e28>] [<80397ee8>] [<80397f08>] [<80398af4>] [<8029a374>] [<800b87ac>] [<800ad20c>] [<8029ccbc>] [<802bc8b0>] [<802bc918>] [<802bcba8>] [<802575bc>] [<8027b370>] [<800abe34>] [<800abe34>] [<800b3168>] [<800abebc>] [<800abf80>] [<800ac6b8>] [<8022e900>] [<800ac34c>] [<8022e900>] [<800ac278>] [<800ac174>] [<80279dec>] [<80279980>] [<800b8954>] [<802b9458>] [<800b3168>] [<80084808>] [<800b3208>] [<80084e18>] [<80082908>] [<802dc1bc>] [<80083180>] [<802d89d8>] [<8030ffdc>] [<80303260>] [<802da298>] [<80084e28>] [<8029afb0>] [<8029b330>] [<80138718>] [<802dad80>] [<80134538>] [<800a4198>] [<802d07c0>] [<803031ec>] [<8030ffdc>] [<80214364>] [<80295864>] [<800a7440>] [<8030ffdc>] [<80214364>] [<80295894>] [<80295864>] [<8029b7b4>] [<80303ab8>] [<80303aa0>] [<8020e0d4>] [<800a7440>] [<802120f4>] [<800a40c8>] [<80310070>] [<80398698>] [<80398840>] [<8029ca50>] [<801d6f8c>] [<8029cd50>] [<8039918c>] [<801cab18>] [<8039b554>] [<8039c8a4>] [<801d8068>] [<80397740>] [<800bdbe8>] [<801c7398>] [<801c7274>] [<801c70cc>] [<80086070>] [<80086060>] I'd already deduced that it was probably a problem related to interrupts (seems we have a lot of those issues on our lovely blue boxes). Looking at the relevant line in kernel/softirq.c yields: void local_bh_enable(void) { __local_bh_enable(); WARN_ON(irqs_disabled()); if (unlikely(!in_interrupt() && local_softirq_pending())) invoke_softirq(); preempt_check_resched(); } EXPORT_SYMBOL(local_bh_enable); So it's clear that something's calling local_bh_enable while interrupts are disabled, which they shouldn't be. I can recreate this problem at will -- so it's definitely replicable. I've really taken this as far as I can in terms of debugging the problem on my own. I'd appreciate any/all assistance/direction in how to track down the culprit here and fix the problem. Thanks, Habeeb.