New Development Update - [1.4.X releases]

"Amar S. Tumballi" <amar@xxxxxxxxxxxxx> · Fri, 6 Jun 2008 14:24:02 -0700

Hi,
 As the topic says, I want to give you all a snapshot of whats coming in
1.4.x series and where is our main focus for the release.

1.4.x -
* Performance of GlusterFS (reduced CPU and memory usage, protocol
enhancements)
* Non blocking I/O - to give more responsiveness to GlusterFS, remove the
issues faced due to timeout, freezing etc.
* Few features to handle different verticals of storage requirements.

The tarballs will be available from -
http://ftp.zresearch.com/pub/gluster/glusterfs/1.4-qa/
Advised to use only latest tarballs in directory, and Report bugs through
Savannah Bug tracking system only, so its easier for us to track them.

You can shift to 'glusterfs--mainline--3.0' branch (from which glusterfs
-1.4.0qa releases are made) if you want to try the latest fixes. Though none
of these are not yet advised for production usage.

That was a higher level description of what is coming. Here are exact
module/translator wise description of whats inside the tarball for you.

* nbio - (non-blocking I/O) This feature comes with significant/drastic
changes in transport layer. Lot of drastic changes to improve the
responsiveness of server and to handle more connections. Designed to scale
for higher number of servers/clients.
  - NOTE that this release of QA has only TCP/IP transport layer supported,
work is going on to port it to ib-verbs module.

* binary protocol - this reduces the amount of header/protocol data
transfered over wire significantly, also reduces the CPU usage as there is
no binary to ASCII (and vica-versa) involved at protocol layer. The
difference may not be significant for large file performance, but for small
files and meta operations this will be phenomenal improvement.

* BDB storage translator - Few people want to use lot and lot of small files
over large storage volume, but for them the filesystem performance for small
files was a main bottleneck. Number of inodes spent, overall kernel overhead
in create cycle etc was quite high. With introduction of BDB storage at the
backend, we tried to solve this issue. This is aimed at giving tremendous
boost for cases where, millions and millions of small files are in a single
directory.
[NOTE: This is not posix complaint as no file attribute fops are not
supported over these files, however file rename and having multiple
directory levels, having symlinks is supported].
GlusterFS BDB options here:
http://gluster.org/docs/index.php/GlusterFS_Translators_v1.3#Berkeley_DB
Also refer this link -
http://www.oracle.com/technology/documentation/berkeley-db/db/gsg/CXX/dbconfig.html,
so you can tune BDB better. We are still investigating the performance
numbers for files bigger than page-size, but you can give it a try if your
avg file size is well below 100k mark.

* libglusterfsclient - A API interface for glusterfs (for file create/write,
open/write, open/read cases). Which will merge all these calls in one fop
and gives a much better performance by removing the latency caused by lot of
calls happening in a single file i/o. (Aimed at small file performance, but
users may have to write their own apps using libglusterfs client.
Check this tool -
http://ftp.zresearch.com/pub/gluster/glusterfs/misc/glfs-bm.c
You need to compile this like ' # gcc -lglusterfsclient -o glfs-bm
glfs-bm.c'

* mod_glusterfs: Aimed at solving the problem web hosting companies have. We
embedded glusterfs into apache-1.3.x, lighttpd-1.4.x, lighttpd-1.5, (and
work going on for apache-2.0). By doing this, we could save the context
switch overhead which was significant if web servers were using glusterfs
mountpoint as document root. So, now, the web servers itself can be cluster
filesystem aware hence they see much bigger storage system well within their
space.
For Apache 1.3 -
http://gluster.org/docs/index.php/Getting_modglusterfs_to_work
For Lighttpd - http://gluster.org/docs/index.php/Mod_glusterfs_for_lighttpd

* improvement to io-cache to handle mod-glusterfs better.

Other significant work going on parallel to these things are:
* work towards proper input validation and abort in the cases where any
memory corruption is seen.
* work towards reducing the overhead caused by runtime memory allocations
and free.
* work on reducing the string based operations in the code base, which
reduces the CPU usage.
* log msgs improvement (reduce the log amount, make log rotate work
seemlessly).
* strict check in init() time, so the mounting itself wont happen if some
config is not valid. (To get the points mentioned in 'Best Practices' page
into the codebase itself).
* Make sure the port on other OSes work fine.

We expect to get this branch to stability very soon (within a month or so),
so we won't be having complains about small file performance and
timeout/hang issues anymore (from 1.3.x branch). Hence your help in testing
this out for your application,  your configuration, and reporting bugs would
help us to get it to stability even faster.

Regards,
GlusterFS Team

PS: Currently this release of qa is not tested on any other OS than
GNU/Linux. I will write back as soon as we have a fix for each specific OS.

Regards,
Amar

-- 
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Super Storage!