I am just starting playing with Gluster but I think I can give you some answers from my experience. On Thursday 21 October 2010 17:09:32 Rudi Ahlers wrote: > Hi all, > > I'm considering setting up Gluster, and have a few questions if you don't > mind. > > > 1. Which option is better? I already have a few CentOS 5.5. server > setup. Would it be better to just install GlusterFS, or to install > Gluster Storage Platform from scratch? How / where can I see a full > comparison between the 2? Are there any performance / management > benefits in choosing the one of the other? > The Gluster Storage Platform requires GlusterFS. The platform is a complete OS (linux Fedora) + GlusterFS + Web Management in a single package that can be installed via USB in a few minutes. It is supposed to simplify installation, setup and management of GlusterFS clusters but.... I could not get it to work properly. I was unable to add new servers. Everytime I pressed the add new server button I got an error saying "Could not retrive installer ip address". And since the platform is relative new there is near zero documentation/issue reports about it. Also adding the servers/volumes via command line never reflected to the web based GUI So I installed Ubuntu 10.10 LTS and GlusterFS 3.1 via source code and handling the server/volumes etc via the new command line is a breeze. > 2. I need reliability and speed. From what I understand, I could setup > 2 servers to work similar to software RAID1 (mirroring). Is it also > correct to assume that I could use 4 servers in a RAID10 / 1+0 type > setup? But then obviously serverA & serverB will be mirrored, and > serverC & serverD together? What happens to the data? Does it get > filled randomly between the 2 sets of servers, or does it get put onto > serverA & B first, till it's full then move over to C & D? > I only have two servers for testing. What you setup are volumes and each volume can be configured depending on your needs. This is what I understand so far: Distributed volume: Aggregates the storage of several directories (bricks in gluster terms) among several computers. The benefit is that you can grow/shrink the volume as you please. The bad part is that this offers no performance/reliability guarantees as files are stored randomly among the disks in the volume. Replicated volume: Requires minimum 2 bricks in separate servers. All files are replicated among the bricks. How many replicas can be configured at volume creation. Has all the benefits of a Distributed volume plus fail resilience. Stripe volume: Requires minimum 2 bricks in separate servers. All files are splitted in stripes and these stripes are distributed among the bricks of the volume. How many stripes and which size is configured on volume creation. Has all the benefits of Replicated volume plus reliability and can improve read performance for large files as the read is distributed among several machines. > 3. Has anyone noticed any considerable differences in using 1x 1GB NIC > & 2x 1GB NIC's bonded together? Or should I rather use a Quad port NIC > if / where possible? > > 4. How do clients (i.e. users) connect if I want to give them normal > FTP / SMB / NFS access? Or do I need to mount the exported Gluster to > another Linux server first which runs these services already? > Gluster 3.1 has a native NFS v3 implementation so you can mount any Gluster volume as a normal NFS mount. For SMB you need to configure samba to share the volume and you can easily access the files on any of the bricks via SCP or FTP if you have an SSH or FTP server configured. For linux the recommended way is to use the glusterfs module to mount as a gluster file system. > 5. If there's 10 Gluster servers, for example, with a lot of data > spread out across them. How do the clients connect, exactly? I.e. do > they all connect to a central server which then just "fetches and > delivers" the content to the clients, or do the client's connect > directly to the specific server where their content is? i.e. is the > network traffic split evenly across the servers, according to where > the data is stored? > This is also something I would like to know. When connecting clients I use the command mount -t [nfs|glusterfs] <ip-address>:<volume-name> /mount/point where ip-address is the IP of any of the servers that have the volume configured. It is not clear to me how the reliability part works here. If I disconnect the server with that ip-address I loose access to the files. True that the files are still accessible via other servers but I need to manually set the mount to point to another server which is not exactly high- availability. > tia :) -- regards, Horacio Sanson