Re: Kernel panic with load balancing

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mi, 2006-03-08 at 06:24 +0100, Eduardo Fernández wrote:
> <0> Kernel panic - not syncing:Fatal exception in interrupt
> 
> Any ideas? Thank you very much!

Is it a SIS chipset and an Intel cpu? Reading some kernel sources, I
finally (after *months* of trouble) found the problem I had with a
customer. I used a router with a quad-port 100 MBit NIC (for three DSL
modems) and three Gigabit NICs: one onboard, two cheap Realtek 8169s (1:
WLAN, 2: about a dozen clients, 3: two servers, one printer).

The driver generated one interrupt for each packet (not like the NAPI
drivers, e.g. e1000, which keep the interrupt count low). As the
interrupt controller did not handle every interrupt in time when the
network was saturated (some chipsets, especially SIS, seem to just leave
out the handling of a handful of interrupts under these circumstances),
the whole system froze with exactly the same message you quote. Every
thursday at about 17:00, everyone screamed because the net broke.

*** report of my frustrating experience with a computer illiterate
employee follows, you may skip the next paragraph (it only explains why
catastrophe always struck thursday at tea-time) ***

I only found the bug after waiting a whole thursday afternoon, observing
every user, getting more and more nervous and for once hoping that the
shit *would* hit the fan that day. Reconstructing the last steps of an
accountant, which was the only employee leaving at exactly the time of
the crash, I finally saw the light. Obviously, she did a backup of the
accounting database over the LAN, and everytime when she started it, the
net crashed immediately (after which she switched off her PC and went
home, sent off by the screams of her colleagues, who lost quite a lot of
work everytime the database server was suddenly unreachable). Of course
she never told anyone that the net always crashed exactly at the time
when she started her backup. And of course she never got the idea she
could maybe once *not* start her backup and look if the net would crash
at 17:00 anyway, or if it might be her backup messing things up. I was
quite frustrated that anyone could be so stupid, week after week trying
a backup which never succeeded. She was lucky she never needed to
restore the data in all those months.

The correct solution is to exchange the mainboard, because the chipset
is crap. My solution was to exchange the NICs, because it was cheaper
and faster in this case. (Of course everyone thought it was my fault
then, because I had originally bought the cheap NICs. I am not sure they
understood my explanation that it was the chipset of the client PC they
gave me for refitting as router/firewall/web proxy/name-, dhcp, vpn and
everythingelseunderthesun-server which was really b0rken. I learned to
request real server hardware for jobs like this one in the future.)

I replaced the two Realtek gigabit NICs with Intel Pro/1000 GT/MT
(desktop!) adapters, (e1000 driver, I believe they used Intel's 82542
chipset, and I bought them for 49 € each - not as cheap as 12,97 € like
the crappy Realteks, but not as expensive as the "server" adapters,
which they sell for more than 120 €).

This immediately solved the problem for me. I hope this helps you. 
_______________________________________________
LARTC mailing list
LARTC@xxxxxxxxxxxxxxx
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc


[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux