Re: who is using nfs-ganesha and cephfs?

"Jens-U. Mozdzen" <jmozdzen@xxxxxx> · Wed, 15 Nov 2017 11:19:07 +0000

Hi all,

By Sage Weil <sweil@xxxxxxxxxx>:
Who is running nfs-ganesha's FSAL to export CephFS?  What has your
experience been?

(We are working on building proper testing and support for this into
Mimic, but the ganesha FSAL has been around for years.)

After we had moved most of our file-based data to a CephFS environment  
and suffering from what later turned out to be a (mis-)configuration  
issue with our existing nfsd server, I had decided to give Ganesha a  
try.

We run a Ceph cluster on three servers, openSUSE Leap 42.3, Ceph  
Luminous (latest stable). 2x10G interfaces for intra-cluster  
communication, 2x1G towards the NFS clients. CephFS meta-data is on an  
SSD pool, the actual data is on SAS HDDs, 12 OSDs. Ganesha version is  
2.5.2.0+git.1504275777.a9d23b98f-3.6. All Ganesha/nfsd server services  
are on one of the servers that are also Ceph nodes.

We run an automated, distributed build&stage environment (tons of gcc  
compiles on multiple clients, some Java compiles, RPM builds etc.),  
with (among others) nightly test build runs. These usually take about  
8 hours, when using kernel nfsd and local storage on the same servers  
that also provide the Ceph service.

After switching to Ganesha (with CephFS FSAL, Ganesha running on the  
same server where we originally had run nfsd) and starting test runs  
of the same work load, we aborted the runs after about 12 hours - by  
then, only (estimated) 60 percent of the job were done.

For comparison, when now using kernel nfsd to serve the CephFS shares  
(mounting the single CephFS via kernel FS module on the server that's  
running nfsd, and exporting multiple sub-directories via nfsd), we see  
an increase of between none and eight percent of the original run time.

So to us, comparing "Ganesha+CephFS FSAL" to "kernel nfsd with kernel  
CephFS module", the latter wins. Or to put it the other way around,  
Ganesha seems unusable to us in its current state, judging by the  
slowness observed.

Other issues I noticed:

- the directory size, as shown by "ls -l" on the client, was very  
different from that shown when mounting via nfsd ;)

- "showmount" did not return any entries, with would have (later on,  
had we continued to use Ganesha) caused problems with our dynamic  
automouter maps

Please note that I did not have time to do intensive testing against  
different Ganesha parameters. The only runs I made were with or  
without "MaxRead = 1048576; MaxWrite = 1048576;" per share, per some  
comment about buffer sizes. These changes didn't seem to bring much  
difference, though.

We closely monitor our network and server performance, I could clearly  
see a huge drop of network traffic (NFS server to clients) when  
switching from nfsd to Ganesha, and an according increase when  
switching back to nfsd (sharing the CephFS mount). None of the servers  
seemed to be under excessive load during these tests but it was  
obvious that Ganesha took its share of CPU - maybe the bottle-neck  
were some single-threaded operations, so Ganesha could not make use of  
the other, idling cores. But I'm just guessing here.

Regards,
J

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com