gluster local vs local = gluster x4 slower

ao at m-privacy.de (Amon Ott) · Wed, 31 Mar 2010 12:12:25 +0200

On Wednesday 31 March 2010 wrote Jeremy Enos:
> That too, is what I'm guessing is happening.  Besides official
> confirmation of what's going on, I'm mainly just after an answer as to
> if there is a way to solve it, and make a locally mounted single disk
> Gluster fs perform even close to as well as a single local disk
> directly, including for cached transactions.  So far, the performance
> translators have had little impact in making small block i/o performance
> competitive.

IMHO, this is a design issue with FUSE. When your client program (dd etc.) 
makes a filesystem request, the following happens:

1. System call in client program process context
2. Schedule, context switch to glusterfs process, pass request data
3. several times: scheduling and context switches between glusterfs and 
glusterfsd processes, until data has been requested and delivered to 
glusterfs process
_or_3a. Network connection to remote glusterfsd with even more latency
4. schedule and context switch to client program process

This has to happen for every file read. Bigger files make smaller overhead per 
KB, specially the synchronization overhead is much smaller - but we still 
have to switch processes for every transfered block of data. The bigger the 
read block sizes, the faster we get. This should be shown by dd with various 
bsize= settings.

Unix sockets could speed things up locally, but this would need a new 
transport module. Caching in glusterfs process avoids the extra connect and 
scheduling to glusterfsd, but still needs the other switches to actually 
transfer the data to the client program, needs extra memory and overhead for 
cache maintenance.

Direct local filesytem access in comparizon:
1. System call in client program process context, block to wait for data, 
schdule
2. Disk access
3. Wake up process and return data

If no other process needs CPU during wait, there is not even a context switch.

This should also make clear why the native glusterfs access with Booster can 
be much faster: It avoids many schedules and context switches, so it has much 
lower latency. A filesystem as native kernel module would also be much faster 
for the same reason, and it could use the kernel inode, dentry and caching 
infrastructure to avoid double data (and probably the memory leak we see in 
glusterfs).

Please note that I am only talking of latencies here, not of transfer speed 
per block. The combined throughput of all concurrent accesses should scale 
pretty well, but I have not tested that.

Amon Ott
-- 
Dr. Amon Ott - m-privacy GmbH
Am K?llnischen Park 1, 10179 Berlin
Tel: +49 30 24342334
Fax: +49 30 24342336
Web: http://www.m-privacy.de
Handelsregister:
 Amtsgericht Charlottenburg HRB 84946
Gesch?ftsf?hrer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky
GnuPG-Key-ID: EA898571