Tom, I currently administer a system running a similar but larger setup, so I may be able to help you. First, make sure you contact Coraid. They are really good about helping with this stuff. Second, have you looked at /dev/etherd/err? There is usually a lot of good debugging there. Third, have you upgraded the firmware in the Coraid and built the newest AoE driver? These are absolutely critical in getting the best performance / reliability and generally the plain kernel driver has fallen behind. They assure me they're working on this and I can vouch for the fact that this driver is essentially the one in the kernel with development necessary to make it work--not some sort of vendor supplied out-of-tree driver. Finally, make sure you have good switches. I have had a number of switches that drop a packet here and there. These are death to AoE performance. Gigabit is generally a must as well. On Dec 10, 2006, at 2:03 AM, bigendian+gfs@xxxxxxxxx wrote: I've just set up a new two-node GFS cluster on a CORAID sr1520 ATA-over-Ethernet. My nodes are each quad dual-core Opteron CPU systems with 32GB RAM each. The CORAID unit exports a 1.6TB block device that I have a GFS file system on. -- Jayson Vantuyl Systems Architect Engine Yard |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster