SMP/Network related oops (2.2.16)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Lately I've been updating our smp machine, and alongside built  a second
smp machine. The first one, apart from a "stuck on TLB" glitch two months ago
never crashed. 
Lately, some changes had been made. One is they now both run Helix-gnome 1.2
with updates and a distributed net client, along with ofcourse the redhat 6.2
updates.

Both machines have become highly unstable when running X on them. But that
could just be a manifestation of the extra load the machines receive when
it runs. I hardly believe Helix binaries are the cause here.

All crashes so far showed no log entries whatsoever. The machine would suddenly
become extremely slow, and in a matter of 3-5 seconds, the mouse would freeze
along with the entire machine. Today, I managed to get a logentry, though
ksymoops can't seem to read it (and I can't read/match the symbols for some
odd reason).

Aug 31 16:28:49 dupla kernel:
Aug 31 16:28:49 dupla kernel: wait_on_bh, CPU 0:
Aug 31 16:28:49 dupla kernel: irq:  1 [0 1]
Aug 31 16:28:49 dupla kernel: bh:   1 [0 1]
Aug 31 16:29:20 dupla kernel: <[c010be9d]> <[c0169cc2]> <[c0169d3d]> <[c017990d]> <[c0151d6f]> <[c013496b]> <[c0134ac7]> stuck on TLB IPI wait (CPU#0)
Aug 31 16:29:20 dupla kernel: stuck on TLB IPI wait (CPU#0)
Aug 31 16:29:20 dupla kernel: stuck on TLB IPI wait (CPU#0)

After three of these, a fourth one happened on CPU#1, then it continued on 
CPU#0 again. This time I had managed to switch back to console mode just 
before the system froze completely, and managed to use SysRq-r to remount ro
and SysRq-b to boot the machine.

Ksymoops said:

Warning (Oops_read): Code line not seen, dumping what data is available

Trace; c010be9d <synchronize_bh+3d/50>
Trace; c0169cc2 <tcp_listen_poll+12/50>
Trace; c0169d3d <tcp_poll+3d/100>
Trace; c017990d <inet_poll+21/2c>
Trace; c0151d6f <sock_poll+1f/24>
Trace; c013496b <do_poll+7b/dc>
Trace; c0134ac7 <sys_poll+fb/17c>

819 warnings and 1 error issued.  Results may not be reliable.

The networkcard is an HP 100VG Anylan (driver hp100.o)

If needed, I can provide access (including root) on the spare dual CPU 
machine.

This machine is an Asus P2L97-DS, with two P-II Deschutes, 333Mhz. CPU#0 is
stepping 0, CPU#1 is stepping 2.

As I said, we have two dual CPU systems. The other one has the same symptoms,
but is an Asus P2B-DS with two identical P-III KatMai's on 450Mhz, stepping 7.
But I've never managed to get a log entry on that one. And since it's a 
production machine, I'm no longer running X on it [1].

Paul Wouters
Xtended Internet

[1] I felt really awfull running X on the NIS master to begin with :)
--
Broerdijk 27			Postbus 170		Tel: 31-24-360 39 19	
6523 GM Nijmegen		6500 AD Nijmegen	Fax: 31-24-360 19 99
The Netherlands			The Netherlands		info@xtdnet.nl

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org


[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux