gluster client performance

tcp at gluster.com (Pavan T C) · Wed, 10 Aug 2011 02:53:38 +0530

On Wednesday 10 August 2011 12:11 AM, Jesse Stroik wrote:
> Pavan,
>
> Thank you for your help. We wanted to get back to you with our results
> and observations. I'm cc'ing gluster-users for posterity.
>
> We did experiment with enable-trickling-writes. That was one of the
> translator tunables we wanted to know the precise syntax for so that we
> could be certain we were disabling it. As hoped, disabling trickling
> writes improved performance somewhat.
>
> We are definitely interested in any other undocumented write-buffer
> related tunables. We've tested the documented tuning parameters.
>
> Performance improved significantly when we switched clients to mainline
> kernel (2.5.35-13). We also updated to OFED 1.5.3 but it wasn't
> responsible for the performance improvement.
>
> Our findings with 32KB block size (cp) write performance:
>
> 250-300MB/sec single stream performance
> 400MB/sec multiple-stream per client performance

Ok. Lets see if we can improve this further. Please use the following 
tunables as suggested below:

For write-behind -
option cache-size 16MB

For read-ahead -
option page-count 16

For io-cache -
option cache-size 64MB

You will need to place these lines in the client volume file, restart 
the server and remount the volume on the clients.
Your client (fuse) volume file sections will look like below (of course, 
with change in the volume name) -

volume testvol-write-behind
     type performance/write-behind
     option cache-size 16MB
     subvolumes testvol-client-0
end-volume

volume testvol-read-ahead
     type performance/read-ahead
     option page-count 16
     subvolumes testvol-write-behind
end-volume

volume testvol-io-cache
     type performance/io-cache
     option cache-size 64MB
     subvolumes testvol-read-ahead
end-volume

Run your copy command with these tunables. For now, lets have the 
default setting for trickling writes which is 'ENABLED'. You can simply 
remove this tunable from the volume file to get the default behaviour.

Pavan
>
> This is much higher than we observed with kernel 2.6.18 series. Using
> the 2.6.18 line, we also observed virtually no difference between
> running single stream tests and multi stream tests suggesting a
> bottleneck with the fabric.
>
> Both 2.6.18 and 2.6.35-13 performed very well (about 600MB/sec) when
> writing 128KB blocks.
>
> When I disabled write-behind on the 2.6.18 series of kernels as a test,
> performance plummeted to a few MB/sec when writing blocks sizes smaller
> than 128KB. We did not test this extensively.
>
> Disabling enable-trickling-writes gave us approximately a 20% boost,
> reflected in the numbers above, for single-stream writes. We observed no
> significant difference with several streams per client due to disabling
> that tunable.
>
> For reference, we are running another cluster file system on the same
> underlying hardware/software. With both the old kernel (2.6.18.x) and
> the new kernel (2.6.35-13) we get approximately:
>
> 450-550MB/sec single stream performance
> 1200MB+/sec multiple stream per client performance
>
> We set the test directory to write entire files to a single LUN which is
> how we configured gluster in an effort to mitigate differences.
>
> It is treacherous to speculate why we might be more limited with gluster
> over RDMA than the other cluster file system without spending a
> significant amount of analysis. That said, I wonder if there may be an
> issue with the way in which fuse handles write buffers causing a
> bottleneck for RMDA.
>
> The bottom line is that our observed performance was poor using the
> 2.6.18 RHEL 5 kernel line relative to the mainline (2.6.35) kernels.
> Updating to the newer kernels was well worth the testing and downtime.
> Hopefully this information can help others.
>
> Best,
> Jesse Stroik