Warren Togami wrote:
> Both before and after the data center migration to a new rack and new
> switch, we have occasionally been experiencing network trouble to
> app1.fedora.phx.redhat.com.
>
> Since this happened both before and after the new switch, could this
> perhaps be hardware trouble?
>
> Any opinions of what we should do about this? Perhaps...
>
> - More closely monitor, with ping logs over time?
Closer monitoring would probably be good. Seeing what is happening from
the console when these unusual events occur might also provide some
insight as to what is really happening. From the IRC log it looks like
lmacken was able to produce some "oddities" with an nmap scan of app1
and trying to flush iptables?
The firewalls on these boxes have been a little unusual to say the least
in the time I have worked on these boxes. Maybe just getting some of
the Pyroman configs rolled out will clear some of this up.
> - Migrate app1's job to a xen guest on xen2. I assume we have more than
> enough capacity there?
I don't know if I would do that just yet. app1 performs the exact some
duties as app2. So even if app1 were to completely flip out for an
extended period of time, app2 would still be there to handle the work
load. app2 handled the workload while app1 was rebuilt with no issues.
Having app2 around should allow us to look a little closer at the
issue without introducing a server shuffle into things.
> Stacy will be in Arizona again later this month, so he will be able to
> take a look at our hardware then.
It might be worth seeing if Stacy can check the switch app1 is plugged
into to see if it is reporting any unusual events. If it is truly the
NIC flaking out, I would think the switch would see some anomalies from
these events as well.
As a side note, I may be some time between replies... I am on vacation
and away from reliable Internet access for the first part of this week.
--Jeffrey