Low (<0.2ms) latency reads, is it possible at all?

gwillem at gmail.com (Willem) · Wed, 24 Apr 2013 14:31:55 +0200

Thanks for the suggestion!

My testfiles are a few bytes each, so the max-file-size setting doesn't
apply, unfortunately.

Disabling quick-read for the gluster client mount indeed gives a bit lower
latency, but it still isn't better than the Gluster NFS client (or using
the native Linux NFS server, for that matter).

Cheers!
//Willem

On Wed, Apr 24, 2013 at 5:31 AM, Raghavendra Gowdappa
<rgowdapp at redhat.com>wrote:

> Hi willem,
>
> Please find the inlined comments:
>
> ----- Original Message -----
> > From: "Willem" <gwillem at gmail.com>
> > To: gluster-users at gluster.org
> > Sent: Thursday, April 18, 2013 11:58:46 PM
> > Subject: Low (<0.2ms) latency reads, is it possible at
> all?
> >
> > I'm testing GlusterFS viability for use with a typical PHP webapp (ie.
> lots
> > of small files). I don't care so much for the C in the CAP theorem, as I
> > have very few writes. I could live with a write propagation delay of 5
> > minutes (or dirty caches for up to 5 minutes).
> >
> > So I'm optimizing for low latency reads of small files. My testsetup is 2
> > node replication. Each node is both server and gluster client. Both are
> in
> > sync. I stop glusterfs-server @ node2. @node1, I run a simple benchmark:
> > repeatedly (to prime the cache) open & close 1000 small files. I have
> > enabled the client-side io-cache and quick-read translators (see below
> for
> > config).
> >
> > The results are consistently 2 ms per open (O_RDONLY) call. Which is too
> > slow, unfortunately, as I need < 0.2ms.
> >
> > The same test with a local Gluster server and NFS mount, I get somewhat
> > better performance but still 0.6ms.
> >
> > The same test with Linux NFS server (v3) and local mount, I get 0.12ms
> per
> > open.
> >
> > I can't explain the lag using Gluster, because I can't see any traffic
> being
> > sent to node2. I would expect that using the io-cache translator and
> > local-only operation, the performance would approach that of the kernel
> FS
> > cache.
> >
> > Is this assumption correct? If yes, how would I profile the client sub
> system
> > to detect the bottleneck?
> >
> > If no, then I have to accept that 0.8ms open calls are the best that I
> could
> > squeeze out of this system. Then I'll probably look into AFS, userspace
> > async replication or gluster NFS mount with cachefilesd. Which would you
> > recommend?
> >
> > Thanks a lot!
> > BTW I like Gluster a lot, and hope that it is also suitable for this
> small
> > files use case ;)
> >
> > //Willem
> >
> > PS Am testing with kernel 3.5.0-17-generic 64bit and gluster
> 3.2.5-1ubuntu1.
> >
> > Client volfile:
> >
> +------------------------------------------------------------------------------+
> > 1: volume testvol-client-0
> > 2: type protocol/client
> > 3: option remote-host g1
> > 4: option remote-subvolume /data
> > 5: option transport-type tcp
> > 6: end-volume
> > 7:
> > 8: volume testvol-client-1
> > 9: type protocol/client
> > 10: option remote-host g2
> > 11: option remote-subvolume /data
> > 12: option transport-type tcp
> > 13: end-volume
> > 14:
> > 15: volume testvol-replicate-0
> > 16: type cluster/replicate
> > 17: subvolumes testvol-client-0 testvol-client-1
> > 18: end-volume
> > 19:
> > 20: volume testvol-write-behind
> > 21: type performance/write-behind
> > 22: option flush-behind on
> > 23: subvolumes testvol-replicate-0
> > 24: end-volume
> > 25:
> > 26: volume testvol-io-cache
> > 27: type performance/io-cache
> > 28: option max-file-size 256KB
> > 29: option cache-timeout 60
> > 30: option priority *.php:3,*:0
> > 31: option cache-size 256MB
> > 32: subvolumes testvol-write-behind
> > 33: end-volume
> > 34:
> > 35: volume testvol-quick-read
> > 36: type performance/quick-read
>
> default value for option "max-file-size" is 64KB. Seems like your files
> are bigger than 64KB. Can you add this option and rerun the tests? Also can
> you rerun the tests by disabling quick-read and compare the results?
>
> > 37: option cache-size 256MB
> > 38: subvolumes testvol-io-cache
> > 39: end-volume
> > 40:
> > 41: volume testvol
> > 42: type debug/io-stats
> > 43: option latency-measurement off
> > 44: option count-fop-hits off
> > 45: subvolumes testvol-quick-read
> > 46: end-volume
> >
> > Server volfile:
> >
> +------------------------------------------------------------------------------+
> > 1: volume testvol-posix
> > 2: type storage/posix
> > 3: option directory /data
> > 4: end-volume
> > 5:
> > 6: volume testvol-access-control
> > 7: type features/access-control
> > 8: subvolumes testvol-posix
> > 9: end-volume
> > 10:
> > 11: volume testvol-locks
> > 12: type features/locks
> > 13: subvolumes testvol-access-control
> > 14: end-volume
> > 15:
> > 16: volume testvol-io-threads
> > 17: type performance/io-threads
> > 18: subvolumes testvol-locks
> > 19: end-volume
> > 20:
> > 21: volume testvol-marker
> > 22: type features/marker
> > 23: option volume-uuid bc89684f-569c-48b0-bc67-09bfd30ba253
> > 24: option timestamp-file /etc/glusterd/vols/testvol/marker.tstamp
> > 25: option xtime off
> > 26: option quota off
> > 27: subvolumes testvol-io-threads
> > 28: end-volume
> > 29:
> > 30: volume /data
> > 31: type debug/io-stats
> > 32: option latency-measurement off
> > 33: option count-fop-hits off
> > 34: subvolumes testvol-marker
> > 35: end-volume
> > 36:
> > 37: volume testvol-server
> > 38: type protocol/server
> > 39: option transport-type tcp
> > 40: option auth.addr./data.allow *
> > 41: subvolumes /data
> > 42: end-volume
> >
> > My benchmark to simulate PHP webapp i/o:
> > #!/usr/bin/env python
> >
> > import sys
> > import os
> > import time
> > import optparse
> >
> > def print_timing(func):
> > def wrapper(*arg):
> > t1 = time.time()
> > res = func(*arg)
> > t2 = time.time()
> > print '%-15.15s %6d ms' % (func.func_name, int ( (t2-t1)*1000.0 ))
> > return res
> > return wrapper
> >
> >
> > def parse_options():
> > parser = optparse.OptionParser()
> > parser.add_option("--path", '-p', default="/mnt/glusterfs",
> > help="Base directory for running tests (default: /mnt/glusterfs)",
> > )
> > parser.add_option("--num", '-n', type="int", default=100,
> > help="Number of files per test (default: 100)",
> > )
> > (options, args) = parser.parse_args()
> > return options
> >
> > class FSBench():
> > def __init__(self,path="/tmp",num=100):
> > self.path = path
> > self.num = num
> > @print_timing
> > def test_open_read(self):
> > for filename in self.get_files():
> > f = open(filename)
> > data = f.read()
> > f.close()
> > def get_files(self):
> > for i in range(self.num):
> > filename = self.path + "/test_%03d" % i
> > yield filename
> > @print_timing
> > def test_stat(self):
> > for filename in self.get_files():
> > os.stat(filename)
> >
> > @print_timing
> > def test_stat_nonexist(self):
> > for filename in self.get_files():
> > try:
> > os.stat(filename+"blkdsflskdf")
> > except OSError:
> > pass
> > @print_timing
> > def test_write(self):
> > for filename in self.get_files():
> > f = open(filename,'w')
> > f.write('hi there\n')
> > f.close()
> > @print_timing
> > def test_delete(self):
> > for filename in self.get_files():
> > os.unlink(filename)
> > if __name__ == '__main__':
> >
> > options = parse_options()
> > bench = FSBench(path=options.path, num=options.num)
> > bench.test_write()
> > bench.test_open_read()
> > bench.test_stat()
> > bench.test_stat_nonexist()
> > bench.test_delete()
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130424/156b5975/attachment.html>