Turn port security on to rid rogue machines. As Gordan suggested use a
private interface for the cluster communications
and that will resolve the issue with the switch going down. If you use
the point-to-point nic then you will have to reconfigure
your cluster to use the new nodenames assigned to the private lan.
Andrew Lacey wrote:
There's an argument that if your switch is down for 30 minutes, you
have bigger problems. If you have a 30 minute switch outage, the chances
are that you can live with the node power-up time on top of that.
Point taken, but the problem is that if there is a switch outage and the
nodes kill each other, then somebody has to come in, power the nodes back
on and make sure everything comes up OK. It would be much easier if the
nodes would just detect that the switch is down and wait patiently without
doing anything (since there is really nothing wrong with the nodes at all,
and if they just wait for the switch to come back, everything will be
fine.)
We do have a history of flaky network here because we're a college...we
have a lot of machines on campus that we don't control (student-owned) and
we get weird traffic, rogue machines, etc. more frequently than a
locked-down corporate environment. I want to make sure that one of those
network events doesn't needlessly bring down our mail service, which is
what will be running on this cluster.
-Andrew L
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster