write-behind is not being used in your configuration. You need to chain the performance translators. avati I've tested several clustered file systems (OCFS2, XSAN, GFS2) and I > really like the simplicity (unify/stripe translators) and portability > of Gluster. However, I'm getting poor performance versus NFS if I use > a bs=4k with dd but if I use bs=128k then the performance is > comparable to gigE NFS, still nowhere near the speed directly to the > storage but that's ok because everything is going over TCP/IP when > using Gluster. Here's the test on the storage itself (15 disk RAID5 on > an Infortrend Eonstor A16F-G2221 2GB FC <-> 4GB FC QLogic Switch <-> > 4GB FC QLogic HBA on the server "porpoise" running XFS on the RAID5): > > 90 porpoise:/export/eon0/tmp% time dd if=/dev/zero of=testFile bs=4k > count=500000 > 2048000000 bytes (2.0 GB) copied, 9.37949 s, 218 MB/s > > Here's the NFS mount going over gigE (server and client are the same): > > porpoise-san:/export/eon0 on /mnt/eon0 type nfs (rw,addr=10.2.179.3) > > Here's the test: > > 93 porpoise:/mnt/eon0/tmp% time dd if=/dev/zero of=testFile bs=4k > count=500000 > 2048000000 bytes (2.0 GB) copied, 25.7614 s, 79.5 MB/s > > Basically I'm looking for something comparable to the NFS test above > with Gluster, Here's the mount: > > glusterfs 5.1T 3.6G 5.1T 1% /export/glfs > > Here's the test: > > 88 porpoise:/export/glfs/tmp% time dd if=/dev/zero of=testFile bs=4k > count=50000 > 204800000 bytes (205 MB) copied, 17.7291 s, 11.6 MB/s > 0.106u 0.678s 0:17.73 4.3% 0+0k 0+0io 0pf+0w > > The data size was reduced for the GlusterFS test because I didn't want > to wait :) . But if I increase the bs the speed becomes faster: > > 99 porpoise:/export/glfs/tmp% time dd if=/dev/zero of=testFile bs=64k > count=27500 > 1802240000 bytes (1.8 GB) copied, 26.4466 s, 68.1 MB/s > > If I increase the bs=128k it the performance is even better: > > 100 porpoise:/export/glfs/tmp% time dd if=/dev/zero of=testFile > bs=128k count=13750 > 1802240000 bytes (1.8 GB) copied, 21.2332 s, 84.9 MB/s > > How can I tell the Gluster server or client to use a default > read/write block size of 128k or more? With NFS there are the rsize > and wsize options which I believe accomplish the same thing. Here's my > setup, I've dumped the non-relevant bricks as well: > > #### glusterfs-server.vol #### > > volume eon0 > type storage/posix > option thread-count 8 > option cache-size 1024MB > option directory /export/eon0 > end-volume > > volume eon1 > type storage/posix > option directory /export/eon1 > end-volume > > volume eon2 > type storage/posix > option directory /export/eon2 > end-volume > > volume glfs-ns > type storage/posix > option directory /export/glfs-ns > end-volume > > volume writebehind > type performance/write-behind > #option aggregate-size 131072 # in bytes > option aggregate-size 1MB # default is 0bytes > option flush-behind on # default is 'off' > subvolumes eon0 > end-volume > > volume writebehind > type performance/write-behind > option aggregate-size 131072 # in bytes > subvolumes eon1 > end-volume > > volume writebehind > type performance/write-behind > option aggregate-size 131072 # in bytes > subvolumes eon2 > end-volume > > volume writebehind > type performance/write-behind > option aggregate-size 131072 # in bytes > subvolumes glfs-ns > end-volume > > volume readahead > type performance/read-ahead > option page-size 65536 ### in bytes > option page-count 16 ### memory cache size is page-count x page-size > per file > subvolumes eon0 > end-volume > > volume readahead > type performance/read-ahead > option page-size 65536 ### in bytes > option page-count 16 ### memory cache size is page-count x page-size > per file > subvolumes eon1 > end-volume > > volume readahead > type performance/read-ahead > option page-size 65536 ### in bytes > option page-count 16 ### memory cache size is page-count x page-size > per file > subvolumes eon2 > end-volume > > volume readahead > type performance/read-ahead > option page-size 65536 ### in bytes > option page-count 16 ### memory cache size is page-count x page-size > per file > subvolumes glfs-ns > end-volume > > volume iothreads > type performance/io-threads > option thread-count 4 # deault is 1 > option cache-size 64MB > subvolumes eon0 > end-volume > > volume iothreads > type performance/io-threads > option thread-count 4 # deault is 1 > option cache-size 64MB > subvolumes eon1 > end-volume > > volume iothreads > type performance/io-threads > option thread-count 4 # deault is 1 > option cache-size 64MB > subvolumes eon2 > end-volume > > volume iothreads > type performance/io-threads > option thread-count 4 # deault is 1 > option cache-size 64MB > subvolumes glfs-ns > end-volume > > volume server > type protocol/server > option transport-type tcp/server > option auth.ip.eon0.allow 10.2.179.* > option auth.ip.eon1.allow 10.2.179.* > option auth.ip.eon2.allow 10.2.179.* > option auth.ip.glfs-ns.allow 10.2.179.* > subvolumes eon0 eon1 eon2 glfs-ns > end-volume > > #### > > #### glusterfs-client.vol #### > > volume eon0 > type protocol/client > option transport-type tcp/client > option remote-host porpoise-san > option remote-subvolume eon0 > end-volume > > volume eon1 > type protocol/client > option transport-type tcp/client > option remote-host porpoise-san > option remote-subvolume eon1 > end-volume > > volume eon2 > type protocol/client > option transport-type tcp/client > option remote-host porpoise-san > option remote-subvolume eon2 > end-volume > > volume glfs-ns > type protocol/client > option transport-type tcp/client > option remote-host porpoise-san > option remote-subvolume glfs-ns > end-volume > > volume writebehind > type performance/write-behind > #option aggregate-size 131072 # in bytes > option aggregate-size 1MB # default is 0bytes > option flush-behind on # default is 'off' > subvolumes eon0 > end-volume > > volume writebehind > type performance/write-behind > option aggregate-size 131072 # in bytes > subvolumes eon1 > end-volume > > volume writebehind > type performance/write-behind > option aggregate-size 131072 # in bytes > subvolumes eon2 > end-volume > > volume writebehind > type performance/write-behind > option aggregate-size 131072 # in bytes > subvolumes glfs-ns > end-volume > > volume readahead > type performance/read-ahead > option page-size 1MB > option page-count 2 > #option page-size 65536 ### in bytes > #option page-count 16 ### memory cache size is page-count x page-size per > file > subvolumes eon0 > end-volume > > volume readahead > type performance/read-ahead > option page-size 65536 ### in bytes > option page-count 16 ### memory cache size is page-count x page-size per > file > subvolumes eon1 > end-volume > > volume readahead > type performance/read-ahead > option page-size 65536 ### in bytes > option page-count 16 ### memory cache size is page-count x page-size per > file > subvolumes eon2 > end-volume > > volume readahead > type performance/read-ahead > option page-size 65536 ### in bytes > option page-count 16 ### memory cache size is page-count x page-size per > file > subvolumes glfs-ns > end-volume > > volume io-cache > type performance/io-cache > option cache-size 64MB # default is 32MB > option page-size 1MB #128KB is default option > #option priority *.h:3,*.html:2,*:1 # default is '*:0' > option priority *:0 > option force-revalidate-timeout 2 # default is 1 > subvolumes eon0 > end-volume > > #volume unify0 > # type cluster/unify > # option scheduler rr # round robin > # option namespace glfs-ns > #subvolumes eon0 eon1 eon2 > # subvolumes eon0 > #end-volume > > #volume stripe0 > # type cluster/stripe > # option block-size *:1MB > # subvolumes eon0 eon1 eon2 > #end-volume > > #### > > I've tried a very basic server/client setup with no translators to the > setup above and almost everything in between to try to improve the > performance. The server/client system is an Apple XServe G5 running > Gentoo PPC64: > > Linux porpoise 2.6.24.4 #6 Sun Jul 20 00:16:04 CDT 2008 ppc64 > PPC970FX, altivec supported RackMac3,1 GNU/Linux > > % cat /proc/cpuinfo > processor : 0 > cpu : PPC970FX, altivec supported > clock : 2000.000000MHz > revision : 3.0 (pvr 003c 0300) > timebase : 33333333 > platform : PowerMac > machine : RackMac3,1 > motherboard : RackMac3,1 MacRISC4 Power Macintosh > detected as : 339 (XServe G5) > pmac flags : 00000000 > L2 cache : 512K unified > pmac-generation : NewWorld > > % cat /proc/meminfo > MemTotal: 2006988 kB > MemFree: 107864 kB > Buffers: 676 kB > Cached: 1775800 kB > SwapCached: 0 kB > Active: 46672 kB > Inactive: 1762528 kB > SwapTotal: 3583928 kB > SwapFree: 3583624 kB > Dirty: 0 kB > Writeback: 0 kB > AnonPages: 32744 kB > Mapped: 10292 kB > Slab: 65704 kB > SReclaimable: 50620 kB > SUnreclaim: 15084 kB > PageTables: 1180 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > CommitLimit: 4587420 kB > Committed_AS: 261212 kB > VmallocTotal: 8589934592 kB > VmallocUsed: 6352 kB > VmallocChunk: 8589928088 kB > > Here's what's under the hood: > > # lspci > 0000:f0:0b.0 Host bridge: Apple Computer Inc. U3H AGP Bridge > 0001:00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X > Bridge (rev 12) > 0001:00:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X > Bridge (rev 12) > 0001:00:03.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge > 0001:00:04.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge > 0001:00:05.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge > 0001:00:06.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge > 0001:00:07.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge > 0001:01:07.0 Class ff00: Apple Computer Inc. K2 KeyLargo Mac/IO (rev 60) > 0001:02:0b.0 USB Controller: NEC Corporation USB (rev 43) > 0001:02:0b.1 USB Controller: NEC Corporation USB (rev 43) > 0001:02:0b.2 USB Controller: NEC Corporation USB 2.0 (rev 04) > 0001:03:0d.0 Class ff00: Apple Computer Inc. K2 ATA/100 > 0001:03:0e.0 FireWire (IEEE 1394): Apple Computer Inc. K2 FireWire > 0001:05:0c.0 IDE interface: Broadcom K2 SATA > 0001:06:02.0 VGA compatible controller: ATI Technologies Inc Radeon > RV100 QY [Radeon 7000/VE] > 0001:06:03.0 Fibre Channel: QLogic Corp. ISP2422-based 4Gb Fibre > Channel to PCI-X HBA (rev 02) > 0001:06:03.1 Fibre Channel: QLogic Corp. ISP2422-based 4Gb Fibre > Channel to PCI-X HBA (rev 02) > 0001:07:04.0 Ethernet controller: Broadcom Corporation NetXtreme > BCM5704 Gigabit Ethernet (rev 03) > 0001:07:04.1 Ethernet controller: Broadcom Corporation NetXtreme > BCM5704 Gigabit Ethernet (rev 03) > > With glusterfs-1.3.10 and fuse-2.7.3glfs10 compiled from source. Any > help would be greatly appreciated. > > Thanks, > Sabuj Pattanayek > Senior SysAdmin > http://structbio.vanderbilt.edu > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- If I traveled to the end of the rainbow As Dame Fortune did intend, Murphy would be there to tell me The pot's at the other end.