Barry, Just to clarify, the application that would cache files on glusterfs would do it across regular mount points and not copy off from the backend servers, right ? If that is the case then that is fine. Since you mentioned such a small partition my guess would be that you are using SSD on the 128 cache nodes. Is that correct ? Since you can re-generate or retreive files from the upstream file server seamlessly, I would recommend not to use replication and instead configure a 2X cache using distribute configuration. If there are enough files and the application is caching files that are in demand, they will spread out nicely over the 128 nodes and will give you a good load balancing effect. With replication, suppose you have two replicas, like you mentioned, the write goes to both replica servers and the read for a file will go to a preferred server. There is no load balancing per file per se. What I mean is, suppose 100 clients mount a volume that is replicated across 2 servers, if all of them access the same file in read mode, it will be read from the same server and will not be balanced across the 2 servers. This however can be fixed by using a client preferred read server - but this would have to be set on each client. Also, it will work only for a replication count of 2. It does not allow for a preference list for servers - like it would not allow for a replica count of 3, one client to give preference of s1, s2, s3, another client to give preference of s2, s3, s1 and the next one a preference of s3, s1, s2 and so on and so forth. At some point we intend to automate some of that, but since most users use a replication count of 2 only, it can be managed - except of the work required to set preferences on each client. Again, if there are lots of files being accessed, it evens out, so that becomes less of a concern again and gives a load balanced effect. So in summary, read for same file does not get balanced, unless each client sets a preference. However for many files being accessed it evens out and gives a load balanced effect. Since you are only going to write once, that does not hurt performance much ( a replicated write returns only after the write has happened to both replica locations ). Since you are still in testing phase, what you can do is this - create one backend FS on each nodes. Create two directories in that - one called distribute and the other called something like replica<volume><replica#> so you can use that to group it with a similar one on another node for replication. The backend subvolumes exported from the servers can be directories so you can setup a distribute GlusterFS volume as well as the replicated GlusterFS volumes and mount both on the clients and hence test both. At any point when you have decide to use one of them, just umount the other one, delete the directory from the the backend FS and thats it. If you have SSDs like I assumed, you would actually be decreasing wear per cached data ( if there were such a term :-) ) by not using replication. Let me know if you have any questions on this. Regards, Tejas. ----- Original Message ----- From: "Barry Robison" <barry.robison at drdstudios.com> To: gluster-users at gluster.org Sent: Wednesday, March 10, 2010 5:28:24 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: advice on optimal configuration Hello, I have 128 physically identical blades, with 1GbE uplink per blade, and 10GbE between chassis ( 32 blades per chassis ). Each node will have a 80GB gluster partition. Dual-quad core intel Xeons, 24GB RAM. The goal is to use gluster as a cache for files used by render applications. All files in gluster could be re-generated or retrieved from the upstream file server. My first volume config attempt is 64 replicated volumes with partner pairs on different chassis. Is replicating a performance hit? Do reads balance between replication nodes? Would NUFA make more sense for this set-up? Here is my config, any advice appreciated. Thank you, -Barry >>>> volume c001b17-1 type protocol/client option transport-type tcp option remote-host c001b17 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume brick1 option ping-timeout 5 end-volume . <snip> . volume c004b48-1 type protocol/client option transport-type tcp option remote-host c004b48 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume brick1 option ping-timeout 5 end-volume volume replicate001-17 type cluster/replicate subvolumes c001b17-1 c002b17-1 end-volume . <snip> . volume replicate001-48 type cluster/replicate subvolumes c001b48-1 c002b48-1 end-volume volume replicate003-17 type cluster/replicate subvolumes c003b17-1 c004b17-1 end-volume . <snip> . volume replicate003-48 type cluster/replicate subvolumes c003b48-1 c004b48-1 end-volume volume distribute type cluster/distribute subvolumes replicate001-17 replicate001-18 replicate001-19 replicate001-20 replicate001-21 replicate001-22 replicate001-23 replicate001-24 replicate001-25 replicate001-26 replicate001-27 replicate001-28 replicate001-29 replicate001-30 replicate001-31 replicate001-32 replicate001-33 replicate001-34 replicate001-35 replicate001-36 replicate001-37 replicate001-38 replicate001-39 replicate001-40 replicate001-41 replicate001-42 replicate001-43 replicate001-44 replicate001-45 replicate001-46 replicate001-47 replicate001-48 replicate003-17 replicate003-18 replicate003-19 replicate003-20 replicate003-21 replicate003-22 replicate003-23 replicate003-24 replicate003-25 replicate003-26 replicate003-27 replicate003-28 replicate003-29 replicate003-30 replicate003-31 replicate003-32 replicate003-33 replicate003-34 replicate003-35 replicate003-36 replicate003-37 replicate003-38 replicate003-39 replicate003-40 replicate003-41 replicate003-42 replicate003-43 replicate003-44 replicate003-45 replicate003-46 replicate003-47 replicate003-48 end-volume volume writebehind type performance/write-behind option cache-size 64MB option flush-behind on subvolumes distribute end-volume volume readahead type performance/read-ahead option page-count 4 subvolumes writebehind end-volume volume iocache type performance/io-cache option cache-size 128MB option cache-timeout 10 subvolumes readahead end-volume volume quickread type performance/quick-read option cache-timeout 1 option max-file-size 64kB subvolumes iocache end-volume volume statprefetch type performance/stat-prefetch subvolumes quickread end-volume _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users