On Wednesday 31 March 2010 wrote Jeremy Enos: > That too, is what I'm guessing is happening. Besides official > confirmation of what's going on, I'm mainly just after an answer as to > if there is a way to solve it, and make a locally mounted single disk > Gluster fs perform even close to as well as a single local disk > directly, including for cached transactions. So far, the performance > translators have had little impact in making small block i/o performance > competitive. IMHO, this is a design issue with FUSE. When your client program (dd etc.) makes a filesystem request, the following happens: 1. System call in client program process context 2. Schedule, context switch to glusterfs process, pass request data 3. several times: scheduling and context switches between glusterfs and glusterfsd processes, until data has been requested and delivered to glusterfs process _or_3a. Network connection to remote glusterfsd with even more latency 4. schedule and context switch to client program process This has to happen for every file read. Bigger files make smaller overhead per KB, specially the synchronization overhead is much smaller - but we still have to switch processes for every transfered block of data. The bigger the read block sizes, the faster we get. This should be shown by dd with various bsize= settings. Unix sockets could speed things up locally, but this would need a new transport module. Caching in glusterfs process avoids the extra connect and scheduling to glusterfsd, but still needs the other switches to actually transfer the data to the client program, needs extra memory and overhead for cache maintenance. Direct local filesytem access in comparizon: 1. System call in client program process context, block to wait for data, schdule 2. Disk access 3. Wake up process and return data If no other process needs CPU during wait, there is not even a context switch. This should also make clear why the native glusterfs access with Booster can be much faster: It avoids many schedules and context switches, so it has much lower latency. A filesystem as native kernel module would also be much faster for the same reason, and it could use the kernel inode, dentry and caching infrastructure to avoid double data (and probably the memory leak we see in glusterfs). Please note that I am only talking of latencies here, not of transfer speed per block. The combined throughput of all concurrent accesses should scale pretty well, but I have not tested that. Amon Ott -- Dr. Amon Ott - m-privacy GmbH Am K?llnischen Park 1, 10179 Berlin Tel: +49 30 24342334 Fax: +49 30 24342336 Web: http://www.m-privacy.de Handelsregister: Amtsgericht Charlottenburg HRB 84946 Gesch?ftsf?hrer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: EA898571