Shehjar Tikoo wrote:
The answer to your question is, yes, it will be possible to export your
local file system with knfsd and glusterfs distributed-replicated
volumes with Gluster NFS translator BUT not in the first release.
See comment above. Isn't that all the more reason to double check
performance figures before even bothering?
In fact, I may have just convinced myself to acquire some iozone
performance figures. Will report later.
OK, I couldn't get iozone to report sane results. glfs was reporting
things in the reasonable ball park I'd expect (between 7MB/s and
110MB/s which is what I'd expect on gigabit ethernet). NFS was
reporting figures that look more like the memory bandwidth so I'd
guess that FS-Cache was taking over. With O_DIRECT and O_SYNC figures
were in the 700KB/s range for NFS which is clearly not sane because in
actual use the two seem fairly equivalent.
So - I did a redneck test instead - dd 64MB of /dev/zero to a file on
the mounted partition.
On writes, NFS gets 4.4MB/s, GlusterFS (server side AFR) gets 4.6MB/s.
Pretty even.
On reads GlusterFS gets 117MB/s, NFS gets 119MB/s (on the first read
after flushing the caches, after that it goes up to 600MB/s). The
difference in the unbuffered readings seems to be in the sane ball
park and the difference on the reads is roughly what I'd expect
considering NFS is running UDP and GLFS is running TCP.
So in conclusion - there is no performance difference between them
worth speaking of. So what is the point in implementing a user-space
NFS handler in glusterfsd when unfsd seems to do the job as well as
glusterfsd could reasonably hope to?
A single dd, which is basically sequential IO is something even
an undergrad OS 101 project can optimize for. We, on the other hand,
are aiming higher. We'll be providing much better meta-data
performance, something unfsd sucks at(..not without reason, I
appreciate the measures it takes for ensuring correctness..) due to
the large number of system calls it performs, much better support for
concurrency in order to exploit the proliferating multi-cores, much
better parallelism for multiple NFS clients where all of them are
hammering away at the server, again something unfsd does not to do.
Since you (quite rightly) say that a single sequential I/O isn't a
particularly valid real-world test case, I now have some performance
figures, and they are showing a similar equivalence between glfs and
unfsd client connections (see tests 8,9 below).
The testing was done using the following method:
make clean;
# prime the caches for the benefit of the doubt
find . -type f -exec cat '{}' > /dev/null \;;
sync;
# The machines involved are quad core
time make -j8 all
1) pure ext3 6:40 CPU bound
2) ext3 15:15 rootfs (glfs, no cache) I/O bound
3) ext3+knfsd 7:02 mostly network bound
4) ext3+unfsd 16:04
5) glfs 61:54 rootfs (glfs, no cache) I/O bound
6) glfs+cache 32:32 rootfs (glfs, no cache) I/O bound
7) glfs+unfsd 278:30
8) glfs+cache+unfsd 189:15
9) glfs+cache+glfs 186:43
Notes:
- Time is in minutes:seconds
- GlusterFS 2.0.9 was used in all cases, on RHEL 5.4, 64-bit
- The times are for building the RHEL 5.4 kernel
- noatime is used on all mounts
- cache means that caching was applied on the server in the form of
writebehind and io-cache translators directly on top of the assembled
AFR bricks.
- All tests except 2, 5, and 6 were done on a Quad Core2 3.2 GHz with
2GB of RAM
- Tests 2, 5, and 6 were done on a Phenom X4 2.8GHz with 4GB of RAM. In
this instance the figures are reasonably comparable
- In tests 2, 5, 6 rootfs (which is where gcc and other binaries are),
was on glfs, which caused further slow-down.
- In all cases except 1 (where all the files were local), the server was
the same PhenomX4 machine with 4GB of RAM. It was paired in AFR to an
Atom 330 machine in all cases where glfs was used.
- Gigabit network was used in all cases.
- The client was always connecting to a single, server assembled AFR
volume (so the server was proxying write requests to the slaved Atom 330
machine).
- glfs rootfs runs without any performance translators in all cases and
with --disable-direct-io=off
- the volume containing /usr/src where the source code being compiled
resides was always mounted without the direct-io mount parameter mentioned.
Even if we ignore tests, 2, 5, 6, the results are quite concerning:
1) pure ext3 6:40 CPU bound
3) ext3+knfsd 7:02 mostly network bound
4) ext3+unfsd 16:04
7) glfs+unfsd 278:30
8) glfs+cache+unfsd 189:15
9) glfs+cache+glfs 186:43
REsults 1,3,4 above are pretty much just the base line for how long the
operation takes without any glfs involvement.
The main point here is between results 7, 8 and 9:
7) glfs+unfsd 278:30
8) glfs+cache+unfsd 189:15
9) glfs+cache+glfs 186:43
Specifically, the point I was making earlier about glfs vs. unfsd
performance. The difference appears to be quite negligible, so I'd dare
say that in terms of performance, rolling a NFS server into glusterfs
will do absolutely nothing for performance.
So in bullet points:
- unfsd runs at a bit under half the speed of knfsd.
- glfs without writebehind + io-cache translators runs approximately 10x
slower than ext3 (when backed by ext3 as in this test, at least).
- writebehind + io-cache approximately doubles the performance. This is
evident both from tests 5,6 and 7,8
- With glfs being used for the replicated volume to be exported to
clients, the performance is approximately 30x lower than the nearest
comparable case which is ext3+unfsd.
- there is no performance difference between unfsd and glfs for the
exported volumes.
It might be interesting to see whether running additional caching
translators on the client itself might positively affect performance,
but considering how long these tests have taken, I'm feeling less than
motivated to run more at the moment.
I'm looking forward to seeing how 3.0.1 will stack up against this with
it's complete evasion of libfuse.
Gordan