On Fri, 2006-05-26 at 09:27 -0400, Randy Grimshaw wrote: > > I am trying to run a linux high availability cluster (failover pair) > using serial as one of the heartbeats. > > Due to numerous serial over-runs the systems are actually crashing > periodically. > > This is a very frustrating development for a system intended to provide > HA. (certainly not ha ha ha). > > I have updated to the latest bios. > I have checked RTS DTS XON XOFF etc. > This is happening with the stock and custom kernels. > This is happening on three pairs of servers. > The serial ports are detected as: > Serial: 8250/16550 driver $Revision: 1.90 $ 32 ports, IRQ > sharing enabled > serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A > > > Any advice would be greatly appreciated. The most common problem with overruns is running too high a baud rate. Remember, 16550s only have a 16-byte buffer in them. At 38,400 baud, you'll fill that buffer in about 260 microseconds. 9600 baud will fill the buffer in a tiny bit over 1 millisecond. Flow control tries to prevent overflows. Without flow control and if the machine is busy, the interrupt from the chip may not be serviced in time and you'll miss data because you've filled the buffer. Dropping the baud rate down should help, and make sure you use hardware (RTS/CTS) flow control. Remember that software (XON/XOFF) flow control requires the CPU to watch the buffer and send an XOFF when it gets full. You're already overrunning the buffer... software flow control won't help. Heartbeat stuff between nodes in a cluster is NOT a place to try to scrimp and save money! NICs are relatively cheap after all, they have much bigger buffers in them and they use DMA to transfer data to the processor instead of one-byte-at-a-time over the I/O ports. Frankly, NICS are far more reliable--especially for something this critical. ---------------------------------------------------------------------- - Rick Stevens, Senior Systems Engineer rstevens@xxxxxxxxxxxxxxx - - VitalStream, Inc. http://www.vitalstream.com - - - - The world is coming to an end ... SAVE YOUR FILES!!! - ---------------------------------------------------------------------- -- fedora-list mailing list fedora-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list