Hello! 2008/12/15 Rainer Schwemmer <rainer.schwemmer at cern.ch>: > Hello all, > > I am trying to set up a file replication scheme for our cluster of about > 2000 nodes. I'm not sure if what i am doing is actually feasible, so > I'll best just start from the beginning and maybe one of you knows even > a better way to do this. > > Basically we have a farm of 2000 machines which are running a certain > application that, during start up, reads about 300 MB of data (out of a > 6 GB repository) of program libraries, geometry data etc and this 8 > times per node. Once per core on every machine. The data is not modified > by the program so it can be regarded as read only. When the application > is launched it is launched on all nodes simultaneously and especially > now during debugging this is done very often (within minutes). [...] > The interconnect between nodes is TCP/IP over Ethernet. I apologize in advance for not saying much about advanced GlusterFS setups in this post. :-) Before trying a multi-level AFR I'd rule out that a basic AFR setup would not be able to do the job. Try TSTTCPW (The Simplest Thing That Could Possibly Work) - and do some benchmarks. IMHO, anything faster than your NFS server would be an improvement. On setup might be an AFR'd volume on node A and nodes B and exporting that to the clients like a server side AFR. (http://www.gluster.org/docs/index.php/Setting_up_AFR_on_two_servers_with_server_side_replication) Using 20 nodes "B", each one would have ~100 clients. Reexporting the AFR'd GlusterFS volume over NFS would make changes to the client nodes unnecessary. <different ideas> When I read '2000 machines' and 'read only' I thought of this page: http://wiki.systemimager.org/index.php/BitTorrent#Benchmark Would it be possible to use some peer-to-peer software to distribute the program and data files to the local disks? I don't have any experience with networks of that size so I did some calculations using optimistic estimated values: Given 300MB data/core, 8 cores per node, 2000 nodes and one NFS server over Gigabit Ethernet estimated at a maximum of 100MB/s the data transfer for start up would take 3s/core = 24s/node = 48000s total = ~13.3 hours. Is that anywhere near the time it really takes or did I misread some information? With 10 Gigabit Ethernet and a NFS server powerful enough to use it that time might be reduced by a factor of 10 to ~1.3 hours. Using Gigabit Ethernet and running bittorrent on every node might download a 6GB tar of the complete repository and unpack it to all the local disks within less than 2 hours. Using a compressed file might be a lot faster, depending on compression ratio. Harald St?rzebecher