Hi all,
By Sage Weil <sweil@xxxxxxxxxx>:
Who is running nfs-ganesha's FSAL to export CephFS? What has your
experience been?
(We are working on building proper testing and support for this into
Mimic, but the ganesha FSAL has been around for years.)
After we had moved most of our file-based data to a CephFS environment
and suffering from what later turned out to be a (mis-)configuration
issue with our existing nfsd server, I had decided to give Ganesha a
try.
We run a Ceph cluster on three servers, openSUSE Leap 42.3, Ceph
Luminous (latest stable). 2x10G interfaces for intra-cluster
communication, 2x1G towards the NFS clients. CephFS meta-data is on an
SSD pool, the actual data is on SAS HDDs, 12 OSDs. Ganesha version is
2.5.2.0+git.1504275777.a9d23b98f-3.6. All Ganesha/nfsd server services
are on one of the servers that are also Ceph nodes.
We run an automated, distributed build&stage environment (tons of gcc
compiles on multiple clients, some Java compiles, RPM builds etc.),
with (among others) nightly test build runs. These usually take about
8 hours, when using kernel nfsd and local storage on the same servers
that also provide the Ceph service.
After switching to Ganesha (with CephFS FSAL, Ganesha running on the
same server where we originally had run nfsd) and starting test runs
of the same work load, we aborted the runs after about 12 hours - by
then, only (estimated) 60 percent of the job were done.
For comparison, when now using kernel nfsd to serve the CephFS shares
(mounting the single CephFS via kernel FS module on the server that's
running nfsd, and exporting multiple sub-directories via nfsd), we see
an increase of between none and eight percent of the original run time.
So to us, comparing "Ganesha+CephFS FSAL" to "kernel nfsd with kernel
CephFS module", the latter wins. Or to put it the other way around,
Ganesha seems unusable to us in its current state, judging by the
slowness observed.
Other issues I noticed:
- the directory size, as shown by "ls -l" on the client, was very
different from that shown when mounting via nfsd ;)
- "showmount" did not return any entries, with would have (later on,
had we continued to use Ganesha) caused problems with our dynamic
automouter maps
Please note that I did not have time to do intensive testing against
different Ganesha parameters. The only runs I made were with or
without "MaxRead = 1048576; MaxWrite = 1048576;" per share, per some
comment about buffer sizes. These changes didn't seem to bring much
difference, though.
We closely monitor our network and server performance, I could clearly
see a huge drop of network traffic (NFS server to clients) when
switching from nfsd to Ganesha, and an according increase when
switching back to nfsd (sharing the CephFS mount). None of the servers
seemed to be under excessive load during these tests but it was
obvious that Ganesha took its share of CPU - maybe the bottle-neck
were some single-threaded operations, so Ganesha could not make use of
the other, idling cores. But I'm just guessing here.
Regards,
J
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com