On Mon, Jul 18, 2011 at 10:53 AM, Remi Broemeling <remi at goclio.com> wrote: > Hi, > > We've been using GlusterFS to manage shared files across a number of hosts > in the past few months and have ran into a few problems -- basically one > every month, roughly.? The problems are occasionally extremely difficult to > track down to GlusterFS, as they often masquerade as something else in the > application log files that we have.? The problems have been one instance of > split-brain and then a number of instances of "stuck" files (i.e. any stat > calls would block for an hour and then timeout with an error) as well as a > couple instances of "ghost" files (remove the file, but GlusterFS continues > to show it for a little while until the cache times out). > > We do not place a large amount of load on GlusterFS, and don't have any > significant performance issues to deal with.? With that in mind, the core > question of this e-mail is: "How can I modify our configuration to be the > absolute most stable (problem free) that it can be, even if it means > sacrificing performance?"? In sum, I don't have any particular performance It depends on kind of bugs or issues you are encountering. There might be solution for some bugs and may not be for others. > concerns at this moment, but the GlusterFS bugs that we encounter are quite > problematic -- so I'm willing to entertain any suggested stability > improvement, even if it has a negative impact on performance (I suspect that > the answer here is just "turn off all performance-enhancing gluster > caching", but I wanted to validate that is actually true before going so > far).? Thus please suggest anything that could be done to improve the > stability of our setup -- as an aside, I think that this would be an > advantageous thing to add to the FAQ.? Right now the FAQ contains > information for performance tuning, but not for stability tuning. > > Thanks for any help that you can give/suggestions that you can make. > > Here are the details of our environment: > > OS: RHEL5 > GlusterFS Version: 3.1.5 > Mount method: glusterfsd/FUSE > GlusterFS Servers: web01, web02 > GlusterFS Clients: web01, web02, dj01, dj02 > > $ sudo gluster volume info > > Volume Name: shared-application-data > Type: Replicate > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: web01:/var/glusterfs/bricks/shared > Brick2: web02:/var/glusterfs/bricks/shared > Options Reconfigured: > network.ping-timeout: 5 > nfs.disable: on > > Configuration File Contents: > /etc/glusterd/vols/shared-application-data/shared-application-data-fuse.vol > volume shared-application-data-client-0 > ??? type protocol/client > ??? option remote-host web01 > ??? option remote-subvolume /var/glusterfs/bricks/shared > ??? option transport-type tcp > ??? option ping-timeout 5 > end-volume > > volume shared-application-data-client-1 > ??? type protocol/client > ??? option remote-host web02 > ??? option remote-subvolume /var/glusterfs/bricks/shared > ??? option transport-type tcp > ??? option ping-timeout 5 > end-volume > > volume shared-application-data-replicate-0 > ??? type cluster/replicate > ??? subvolumes shared-application-data-client-0 > shared-application-data-client-1 > end-volume > > volume shared-application-data-write-behind > ??? type performance/write-behind > ??? subvolumes shared-application-data-replicate-0 > end-volume > > volume shared-application-data-read-ahead > ??? type performance/read-ahead > ??? subvolumes shared-application-data-write-behind > end-volume > > volume shared-application-data-io-cache > ??? type performance/io-cache > ??? subvolumes shared-application-data-read-ahead > end-volume > > volume shared-application-data-quick-read > ??? type performance/quick-read > ??? subvolumes shared-application-data-io-cache > end-volume > > volume shared-application-data-stat-prefetch > ??? type performance/stat-prefetch > ??? subvolumes shared-application-data-quick-read > end-volume > > volume shared-application-data > ??? type debug/io-stats > ??? subvolumes shared-application-data-stat-prefetch > end-volume > > /etc/glusterfs/glusterd.vol > volume management > ??? type mgmt/glusterd > ??? option working-directory /etc/glusterd > ??? option transport-type socket,rdma > ??? option transport.socket.keepalive-time 10 > ??? option transport.socket.keepalive-interval 2 > end-volume > > -- > Remi Broemeling > System Administrator > Clio - Practice Management Simplified > 1-888-858-2546 x(2^5) |?remi at goclio.com > www.goclio.com?|?blog?|?twitter?|?facebook > > ____ > _? oo ?_ > (_ _) > | | > ?_??_? > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > >