Hi, We've been using GlusterFS to manage shared files across a number of hosts in the past few months and have ran into a few problems -- basically one every month, roughly. The problems are occasionally extremely difficult to track down to GlusterFS, as they often masquerade as something else in the application log files that we have. The problems have been one instance of split-brain and then a number of instances of "stuck" files (i.e. any stat calls would block for an hour and then timeout with an error) as well as a couple instances of "ghost" files (remove the file, but GlusterFS continues to show it for a little while until the cache times out). We do *not* place a large amount of load on GlusterFS, and don't have any significant performance issues to deal with. With that in mind, the core question of this e-mail is: "How can I modify our configuration to be the absolute *most* stable (problem free) that it can be, even if it means sacrificing performance?" In sum, I don't have any particular performance concerns at this moment, but the GlusterFS bugs that we encounter are quite problematic -- so I'm willing to entertain any suggested stability improvement, even if it has a negative impact on performance (I suspect that the answer here is just "turn off all performance-enhancing gluster caching", but I wanted to validate that is actually true before going so far). Thus please suggest anything that could be done to improve the stability of our setup -- as an aside, I think that this would be an advantageous thing to add to the FAQ. Right now the FAQ contains information for *performance* tuning, but not for *stability* tuning. Thanks for any help that you can give/suggestions that you can make. Here are the details of our environment: OS: RHEL5 GlusterFS Version: 3.1.5 Mount method: glusterfsd/FUSE GlusterFS Servers: web01, web02 GlusterFS Clients: web01, web02, dj01, dj02 $ sudo gluster volume info Volume Name: shared-application-data Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: web01:/var/glusterfs/bricks/shared Brick2: web02:/var/glusterfs/bricks/shared Options Reconfigured: network.ping-timeout: 5 nfs.disable: on Configuration File Contents: */etc/glusterd/vols/shared-application-data/shared-application-data-fuse.vol * volume shared-application-data-client-0 type protocol/client option remote-host web01 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-client-1 type protocol/client option remote-host web02 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-replicate-0 type cluster/replicate subvolumes shared-application-data-client-0 shared-application-data-client-1 end-volume volume shared-application-data-write-behind type performance/write-behind subvolumes shared-application-data-replicate-0 end-volume volume shared-application-data-read-ahead type performance/read-ahead subvolumes shared-application-data-write-behind end-volume volume shared-application-data-io-cache type performance/io-cache subvolumes shared-application-data-read-ahead end-volume volume shared-application-data-quick-read type performance/quick-read subvolumes shared-application-data-io-cache end-volume volume shared-application-data-stat-prefetch type performance/stat-prefetch subvolumes shared-application-data-quick-read end-volume volume shared-application-data type debug/io-stats subvolumes shared-application-data-stat-prefetch end-volume */etc/glusterfs/glusterd.vol* volume management type mgmt/glusterd option working-directory /etc/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 end-volume -- Remi Broemeling System Administrator Clio - Practice Management Simplified 1-888-858-2546 x(2^5) | remi at goclio.com www.goclio.com | blog <http://www.goclio.com/blog> | twitter<http://www.twitter.com/goclio> | facebook <http://www.facebook.com/goclio> ____ _? oo ?_ (_ _) | | ?_??_? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20110718/3b5ca7d1/attachment.htm>