Re: [Fedora-infrastructure-list] Hardware Trouble (?) on app1.fedora.phx.redhat.com

Jeffrey Tadlock <linux@xxxxxxxxxxxxx> · Sun, 10 Sep 2006 20:49:13 -0400

Warren Togami wrote:
> Both before and after the data center migration to a new rack and new
> switch, we have occasionally been experiencing network trouble to
> app1.fedora.phx.redhat.com.
>
> Since this happened both before and after the new switch, could this
> perhaps be hardware trouble?
>
> Any opinions of what we should do about this?  Perhaps...
>
> - More closely monitor, with ping logs over time?

Closer monitoring would probably be good.  Seeing what is happening from 
the console when these unusual events occur might also provide some 
insight as to what is really happening.  From the IRC log it looks like 
lmacken was able to produce some "oddities" with an nmap scan of app1 
and trying to flush iptables?

The firewalls on these boxes have been a little unusual to say the least 
in the time I have worked on these boxes.  Maybe just getting some of 
the Pyroman configs rolled out will clear some of this up.

> - Migrate app1's job to a xen guest on xen2.  I assume we have more than
> enough capacity there?

I don't know if I would do that just yet.  app1 performs the exact some 
duties as app2.  So even if app1 were to completely flip out for an 
extended period of time, app2 would still be there to handle the work 
load.  app2 handled the workload while app1 was rebuilt with no issues. 
 Having app2 around should allow us to look a little closer at the 
issue without introducing a server shuffle into things.

> Stacy will be in Arizona again later this month, so he will be able to
> take a look at our hardware then.

It might be worth seeing if Stacy can check the switch app1 is plugged 
into to see if it is reporting any unusual events.  If it is truly the 
NIC flaking out, I would think the switch would see some anomalies from 
these events as well.

As a side note, I may be some time between replies... I am on vacation 
and away from reliable Internet access for the first part of this week.

--Jeffrey