gluster client performance

jstroik at ssec.wisc.edu (Jesse Stroik) · Tue, 09 Aug 2011 13:41:01 -0500

Pavan,

Thank you for your help.  We wanted to get back to you with our results 
and observations.  I'm cc'ing gluster-users for posterity.

We did experiment with enable-trickling-writes.  That was one of the 
translator tunables we wanted to know the precise syntax for so that we 
could be certain we were disabling it.  As hoped, disabling trickling 
writes improved performance somewhat.

We are definitely interested in any other undocumented write-buffer 
related tunables.  We've tested the documented tuning parameters.

Performance improved significantly when we switched clients to mainline 
kernel (2.5.35-13).  We also updated to OFED 1.5.3 but it wasn't 
responsible for the performance improvement.

Our findings with 32KB block size (cp) write performance:

250-300MB/sec single stream performance
400MB/sec multiple-stream per client performance

This is much higher than we observed with kernel 2.6.18 series.  Using 
the 2.6.18 line, we also observed virtually no difference between 
running single stream tests and multi stream tests suggesting a 
bottleneck with the fabric.

Both 2.6.18 and 2.6.35-13 performed very well (about 600MB/sec) when 
writing 128KB blocks.

When I disabled write-behind on the 2.6.18 series of kernels as a test, 
performance plummeted to a few MB/sec when writing blocks sizes smaller 
than 128KB.  We did not test this extensively.

Disabling enable-trickling-writes gave us approximately a 20% boost, 
reflected in the numbers above, for single-stream writes.  We observed 
no significant difference with several streams per client due to 
disabling that tunable.

For reference, we are running another cluster file system on the same 
underlying hardware/software.  With both the old kernel (2.6.18.x) and 
the new kernel (2.6.35-13) we get approximately:

450-550MB/sec single stream performance
1200MB+/sec multiple stream per client performance

We set the test directory to write entire files to a single LUN which is 
how we configured gluster in an effort to mitigate differences.

It is treacherous to speculate why we might be more limited with gluster 
over RDMA than the other cluster file system without spending a 
significant amount of analysis.  That said, I wonder if there may be an 
issue with the way in which fuse handles write buffers causing a 
bottleneck for RMDA.

The bottom line is that our observed performance was poor using the 
2.6.18 RHEL 5 kernel line relative to the mainline (2.6.35) kernels. 
Updating to the newer kernels was well worth the testing and downtime. 
Hopefully this information can help others.

Best,
Jesse Stroik