On Fri, Nov 13, 2009 at 03:38:04PM -0800, Swift, Jon S PWR wrote: > All, > I have a 2 node test cluster made up of Dell 1850's with only > virtual IP's as services supporting NFS on 3 GFS2 file systems using > RHEL5U4 64 bit. Both nodes of the cluster export/share all 3 file > systems all the time. When I create a NFS load that reduces the CPU > %idle to less than 75% (as shown by top or vmstat) I have problems with > my cluster crashing. I'm using iozone to generate the load from separate > NFS clients. nfsv3 or v4? v4 might do better. > The higher the load on the cluster the more often this > happens. Under a very heavy load it will fail within 5 minutes. But with > a light load, CPU %idle above 75% I see no problems. One system logs > messages like the following, the other one crashes. Most of this CPU > load is I/O wait time. The private network connecting my 2 node cluster > together is currently a cat5 cross over cable. I tried a 10/100/1000 hub > as well, but with it in I was logging collisions. The private network is > using IP's 192.168.15.1 (hostname ic-cnfs01) and 192.168.15.2 (hostname > ic-cnfs02). The storage is an EMC CX3-40, with PowerPath supporting the > logical volumes the GFS2 file systems are built on. > > How do I prevent this condition from happening? Thanks in > advance. > > > Nov 13 11:39:14 cnfs01 openais[5817]: [TOTEM] The token was lost in the > OPERATIONAL state. This is the standard message you get when the token doesn't arrive within the timeout period. You could try increasing the token timeout to say 30 seconds, (default is 10 seconds) > The cluster.conf file is below > > <?xml version="1.0"?> > <cluster alias="cnfs_cluster" config_version="78" name="cnfs"> <totem token=30000/> > The openais.conf file is below openais.conf is ignored when using cman. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster