Hi All, Thanks for great feedback, I had changed ip's and I noticed one server wasn't connecting correctly when checking log. To ensure I had no wrong-doings I've re-done the bricks from scratch, clean configurations, with mount info attached below, still not performing 'great' compared to a single NFS mount. The application we're running our files don't change, we only add / delete files, so I'd like to get directory / file info cached as much as possible. Config info: gluster> volume info data-storage Volume Name: data-storage Type: Replicate Volume ID: cc91c107-bdbb-4179-a097-cdd3e9d5ac93 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fs1:/data/storage Brick2: fs2:/data/storage gluster> On my web1 node I mounted: # mount -t glusterfs fs1:/data-storage /storage I've copied over my data to it again and doing a ls several times, takes ~0.5 seconds: [@web1 files]# time ls -all|wc -l 1989 real 0m0.485s user 0m0.022s sys 0m0.109s [@web1 files]# time ls -all|wc -l 1989 real 0m0.489s user 0m0.016s sys 0m0.116s [@web1 files]# time ls -all|wc -l 1989 real 0m0.493s user 0m0.018s sys 0m0.115s Doing the same thing on the raw os files on one node takes 0.021s [@fs2 files]# time ls -all|wc -l 1989 real 0m0.021s user 0m0.007s sys 0m0.015s [@fs2 files]# time ls -all|wc -l 1989 real 0m0.020s user 0m0.008s sys 0m0.013s Now full directory listing even seems slower... : [@web1 files]# time ls -alR|wc -l 2242956 real 74m0.660s user 0m20.117s sys 1m24.734s [@web1 files]# time ls -alR|wc -l 2242956 real 26m27.159s user 0m17.387s sys 1m11.217s [@web1 files]# time ls -alR|wc -l 2242956 real 27m38.163s user 0m18.333s sys 1m19.824s Just as crazy reference, on another single server with SSD's (Raid 10) drives I get: files# time ls -alR|wc -l 2260484 real 0m15.761s user 0m5.170s sys 0m7.670s For the same operation. (this server even have more files...) My goal is to get this directory listing as fast as possible, I don't have the hardware/budget to test a SSD configuration, but would a SSD setup give me ~1minute directory listing time (assuming it is 4 times slower than single node)? If I added two more bricks to the cluster / replicated, would this double read speed? Thanks for any insight! -------------------- storage.log from web1 on mount --------------------- [2012-06-07 20:47:45.584320] I [glusterfsd.c:1666:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.3.0 [2012-06-07 20:47:45.624548] I [io-cache.c:1549:check_cache_size_ok] 0-data-storage-quick-read: Max cache size is 8252092416 [2012-06-07 20:47:45.624612] I [io-cache.c:1549:check_cache_size_ok] 0-data-storage-io-cache: Max cache size is 8252092416 [2012-06-07 20:47:45.628148] I [client.c:2142:notify] 0-data-storage-client-0: parent translators are ready, attempting connect on transport [2012-06-07 20:47:45.631059] I [client.c:2142:notify] 0-data-storage-client-1: parent translators are ready, attempting connect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume data-storage-client-0 2: type protocol/client 3: option remote-host fs1 4: option remote-subvolume /data/storage 5: option transport-type tcp 6: end-volume 7: 8: volume data-storage-client-1 9: type protocol/client 10: option remote-host fs2 11: option remote-subvolume /data/storage 12: option transport-type tcp 13: end-volume 14: 15: volume data-storage-replicate-0 16: type cluster/replicate 17: subvolumes data-storage-client-0 data-storage-client-1 18: end-volume 19: 20: volume data-storage-write-behind 21: type performance/write-behind 22: subvolumes data-storage-replicate-0 23: end-volume 24: 25: volume data-storage-read-ahead 26: type performance/read-ahead 27: subvolumes data-storage-write-behind 28: end-volume 29: 30: volume data-storage-io-cache 31: type performance/io-cache 32: subvolumes data-storage-read-ahead 33: end-volume 34: 35: volume data-storage-quick-read 36: type performance/quick-read 37: subvolumes data-storage-io-cache 38: end-volume 39: 40: volume data-storage-md-cache 41: type performance/md-cache 42: subvolumes data-storage-quick-read 43: end-volume 44: 45: volume data-storage 46: type debug/io-stats 47: option latency-measurement off 48: option count-fop-hits off 49: subvolumes data-storage-md-cache 50: end-volume +------------------------------------------------------------------------------+ [2012-06-07 20:47:45.642625] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 0-data-storage-client-0: changing port to 24009 (from 0) [2012-06-07 20:47:45.648604] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 0-data-storage-client-1: changing port to 24009 (from 0) [2012-06-07 20:47:49.592729] I [client-handshake.c:1636:select_server_supported_programs] 0-data-storage-client-0: Using Program GlusterFS 3.3.0, Num (1298437), Version (330) [2012-06-07 20:47:49.595099] I [client-handshake.c:1636:select_server_supported_programs] 0-data-storage-client-1: Using Program GlusterFS 3.3.0, Num (1298437), Version (330) [2012-06-07 20:47:49.608455] I [client-handshake.c:1433:client_setvolume_cbk] 0-data-storage-client-0: Connected to 10.1.80.81:24009, attached to remote volume '/data/storage'. [2012-06-07 20:47:49.608489] I [client-handshake.c:1445:client_setvolume_cbk] 0-data-storage-client-0: Server and Client lk-version numbers are not same, reopening the fds [2012-06-07 20:47:49.608572] I [afr-common.c:3627:afr_notify] 0-data-storage-replicate-0: Subvolume 'data-storage-client-0' came back up; going online. [2012-06-07 20:47:49.608837] I [client-handshake.c:453:client_set_lk_version_cbk] 0-data-storage-client-0: Server lk version = 1 [2012-06-07 20:47:49.616381] I [client-handshake.c:1433:client_setvolume_cbk] 0-data-storage-client-1: Connected to 10.1.80.82:24009, attached to remote volume '/data/storage'. [2012-06-07 20:47:49.616434] I [client-handshake.c:1445:client_setvolume_cbk] 0-data-storage-client-1: Server and Client lk-version numbers are not same, reopening the fds [2012-06-07 20:47:49.621808] I [fuse-bridge.c:4193:fuse_graph_setup] 0-fuse: switched to graph 0 [2012-06-07 20:47:49.622793] I [client-handshake.c:453:client_set_lk_version_cbk] 0-data-storage-client-1: Server lk version = 1 [2012-06-07 20:47:49.622873] I [fuse-bridge.c:3376:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 [2012-06-07 20:47:49.623440] I [afr-common.c:1964:afr_set_root_inode_on_first_lookup] 0-data-storage-replicate-0: added root inode -------------------- End storage.log ----------------------------------------------------- On Thu, Jun 7, 2012 at 9:46 AM, Pranith Kumar Karampuri <pkarampu at redhat.com > wrote: > hi Brian, > 'stat' command comes as fop (File-operation) 'lookup' to the gluster > mount which triggers self-heal. So the behavior is still same. > I was referring to the fop 'stat' which will be performed only on one of > the bricks. > Unfortunately most of the commands and fops have same name. > Following are some of the examples of read-fops: > .access > .stat > .fstat > .readlink > .getxattr > .fgetxattr > .readv > > Pranith. > ----- Original Message ----- > From: "Brian Candler" <B.Candler at pobox.com> > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> > Cc: "olav johansen" <luxis2012 at gmail.com>, gluster-users at gluster.org, > "Fernando Frediani (Qube)" <fernando.frediani at qubenet.net> > Sent: Thursday, June 7, 2012 7:06:26 PM > Subject: Re: Performance optimization tips Gluster 3.3? > (small files / directory listings) > > On Thu, Jun 07, 2012 at 08:34:56AM -0400, Pranith Kumar Karampuri wrote: > > Brian, > > Small correction: 'sending queries to *both* servers to check they are > in sync - even read accesses.' Read fops like stat/getxattr etc are sent to > only one brick. > > Is that new behaviour for 3.3? My understanding was that stat() was a > healing operation. > > http://gluster.org/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate > > If this is no longer true, then I'd like to understand what happens after a > node has been down and comes up again. I understand there's a self-healing > daemon in 3.3, but what if you try to access a file which has not yet been > healed? > > I'm interested in understanding this, especially the split-brain scenarios > (better to understand them *before* you're stuck in a problem :-) > > BTW I'm in the process of building a 2-node 3.3 test cluster right now. > > Cheers, > > Brian. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20120608/b30ac40b/attachment-0001.htm>