Andrey Slepuhin wrote:
Dear folks,
We are installing a large diskless cluster using CentOS 5.1. The
hardware is pretty new - Supermicro X7DWT boards with Harpertown CPUs.
Unfortunately we have some PXE-related problems described by the
following scenario:
1) Set up DHCP, TFTP and NFS on a server, prepare PXE kernel and initrd
- fine.
2) Start up the node using PXE for the first time - fine.
3) Reboot the node - PXE boot fails for all next attempts. We see that a
server gets DHCP requests and answers them, but a node doesn't response
with DHCP ack. The typical DHCP log is:
Jan 5 09:14:34 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via
eth1
Jan 5 09:14:34 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to
00:30:48:7e:24:a6 via eth1
Jan 5 09:14:36 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via
eth1
Jan 5 09:14:36 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to
00:30:48:7e:24:a6 via eth1
Jan 5 09:14:40 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via
eth1
Jan 5 09:14:40 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to
00:30:48:7e:24:a6 via eth1
Jan 5 09:14:48 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via
eth1
Jan 5 09:14:48 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to
00:30:48:7e:24:a6 via eth1
4) Anything like DHCP server restart, node reset, node power on/off
doesn't help
5) The only thing that will enable system to boot again over PXE is to
perform "bmc reset cold" command on a node using ipmitool - yes, we have
IPMI card sharing the same Ethernet interface. After that we can boot
CentOS again.
6) When Linux is loaded, if we reboot a node using "bmc power cycle"
instead of reboot or shutdown, a node will boot for the next time
without problems
7) There are no problems with a second GbE interface (without IPMI)
8) So our guess is that Linux on a reboot leaves Ethernet device in some
state that cause brain damage for IPMI+PXE combination. We tried to play
with some e1000 driver options, we are also tried latest Intel driver -
nothing helps.
Do you have any idea what goes wrong? Any help will be much appreciated.
I don't, but we don't share the IPMI interface with PXE and the OS i.e.
we set things up so IPMI uses the first interface and it boots via PXE
off the 2nd and the OS uses the 2nd interface only.
We do this as we had problems using the IPMI Serial-Over-LAN (SOL) and
console redirection over SOL with PXE - the PXE boot would reset the NIC
and break the SOL connection - so we gave up and decided to separate
IPMI from PXE and the OS
James Pearson
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos