NFS replacement

skraw at ithnet.com (Stephan von Krawczynski) · Wed, 2 Sep 2009 12:47:17 +0200

On Tue, 01 Sep 2009 11:33:38 +0530
Shehjar Tikoo <shehjart at gluster.com> wrote:

> Stephan von Krawczynski wrote:
> > On Mon, 31 Aug 2009 19:48:46 +0530 Shehjar Tikoo 
> > <shehjart at gluster.com> wrote:
> > 
> >> Stephan von Krawczynski wrote:
> >>> Hello all,
> >>> 
> >>> after playing around for some weeks we decided to make some real
> >>>  world tests with glusterfs. Therefore we took a nfs-client and 
> >>> mounted the very same data with glusterfs. The client does some 
> >>> logfile processing every 5 minutes and needs around 3,5 mins 
> >>> runtime in a nfs setup. We found out that it makes no sense to 
> >>> try this setup with gluster replicate as long as we do not have 
> >>> the same performance in a single server setup with glusterfs. So
> >>>  now we have one server mounted (halfway replicate) and would
> >>> like to tune performance. Does anyone have experience with some 
> >>> simple replacement like that? We had to find out that almost all
> >>>  performance options have exactly zero effect. The only thing
> >>> that seems to make at least some difference is read-ahead on the
> >>>  server. We end up with around 4,5 - 5,5 minutes runtime of the 
> >>> scripts, which is on the edge as we need something quite below 5
> >>>  minutes (just like nfs was). Our goal is to maximise performance
> >>>  in this setup and then try a real replication setup with two 
> >>> servers. The load itselfs looks like around 100 scripts starting
> >>>  at one time and processing their data.
> >>> 
> >>> Any ideas?
> >>> 
> >> What nfs server are you using? The in-kernel one?
> > 
> > Yes.
> > 
> >> You could try the unfs3booster server, which is the original unfs3 
> >> with our modifications for bug fixes and slight performance 
> >> improvements. It should give better performance in certain cases 
> >> since it avoids the FUSE bottleneck on the server.
> >> 
> >> For more info, do take a look at this page: 
> >> http://www.gluster.org/docs/index.php/Unfs3boosterConfiguration
> >> 
> >> When using unfs3booster, please use GlusterFS release 2.0.6 since 
> >> that has the required changes to make booster work with NFS.
> > 
> > I read the docs, but I don't understand the advantage. Why should we
> >  use nfs as kind of a transport layer to an underlying glusterfs 
> > server, when we can easily export the service (i.e. glusterfs) 
> > itself. Remember, we don't want nfs on the client any longer, but a 
> > replicate setup with two servers (though we do not use it right now,
> >  but nevertheless it stays our primary goal).
> 
> Ok. My answer was simply under the impression that moving to NFS
> was the motive. unfs3booster-over-gluster is a better solution as
> opposed to having kernel-nfs-over-gluster because of the avoidance of
> the FUSE layer completely.

Sorry. To make that one clear again: I don't want to use NFS if not ultimately
necessary. I would be happy to use a complete glusterfs environment without
any patches and glues to nfs, cifs or the like. 
> > It sounds obvious to me
> > that a nfs-over-gluster must be slower than a pure kernel-nfs. On the
> >  other hand glusterfs per se may even have some advantages on the 
> > network side, iff performance tuning (and of course the options 
> > themselves) is well designed. The first thing we noticed is that load
> >  dropped dramatically both on server and client when not using 
> > kernel-nfs. Client dropped from around 20 to around 4. Server dropped
> >  from around 10 to around 5. Since all boxes are pretty much 
> > dedicated to their respective jobs a lot of caching is going on 
> > anyways.
> Thanks, that is useful information.
> 
> So I
> > would not expect nfs to have advantages only because it is 
> > kernel-driven. And the current numbers (loss of around 30% in 
> > performance) show that nfs performance is not completely out of 
> > reach.
> That is true, we do have setups performing as well and in some
> cases better than kernel NFS despite the replication overhead. It
> is a matter of testing and arriving at a config that works for your
> setup.
> 
> 
> > 
> > What advantages would you expect from using unfs3booster at all?
> > 
> To begin with, unfs3booster must be compared against kernel nfsd and not
> against a GlusterFS-only config. So when comparing with kernel-nfsd, one
> should understand that knfsd involves the FUSE layer, kernel's VFS and
> network layer, all of which have their advantages and also
> disadvantages, especially FUSE when using with the kernel nfsd. Those
> bottlenecks with FUSE+knfsd interaction are well documented elsewhere.
> 
> unfs3booster enables you to avoid the FUSE layer, the VFS, etc and talk
> directly to the network and through that, to the GlusterFS server. In
> our measurements, we found that we could perform better than kernel
> nfs-over-gluster by avoiding FUSE and using our own caching(io-cache),
> buffering(write-behind, read-ahead) and request scheduling(io-threads).
> 
> > Another thing we really did not understand is the _negative_ effect 
> > of adding iothreads on client or server. Our nfs setup needs around 
> > 90 nfs kernel threads to run smoothly. Every number greater than 8 
> > iothreads reduces the performance of glusterfs measurably.
> > 
> 
> The main reason why knfsds need a higher number of threads is simply
> because knfsd threads are highly io-bound, that is they wait for for the
> disk IO to complete in order to serve each NFS request.
> 
> On the other hand, with io-threads, the right number actually depends on
> the point at which io-threads are being used. For eg, if you're using
> io-threads just above the posix or  features/locks, the scenario is much
> like kernel nfsd threads, where each io-thread blocks till the disk IO
> is complete. Is this is where you've observed that 8 iothread drop-off?
> If so, then it is something we'll need to investigate.
> 
> The other place where you can you can use io-threads is on the GlusterFS
> client side, it is here that the 8 thread drop-off seems possible since
> the client side in GlusterFS is more CPU hungry than the server, and it
> is possible that 8 io-threads are able to consume as much CPU as is
> available for GlusterFS. Have you observed what the CPU usage figures
> are as you increase the number of io-threads?
> 
> How many CPUs did the machine have when you observed the drop-off beyond
> 8 threads?

The client box has a Quad Core 2 CPU, the server is dual AMD Opteron 246.
I currently try a pretty simple setup with this client vol file:

volume remote1
  type protocol/client
  option transport-type tcp
  option remote-host 192.168.82.1
  option remote-subvolume testfs
end-volume

volume remote2
  type protocol/client
  option transport-type tcp
  option remote-host 192.168.82.2
  option remote-subvolume testfs
end-volume

volume replicate
  type cluster/replicate
  option data-self-heal on
  option metadata-self-heal on
  option entry-self-heal on
  subvolumes remote1 remote2
end-volume

volume writebehind
  type performance/write-behind
  #option aggregate-size 1MB # option is unkown
  # option window-size 1MB
  option cache-size 2MB
  #option block-size 1MB # option is unkown
  option flush-behind on
  subvolumes replicate
end-volume

I tried read-ahead and others but they don't boost performance at all.
Only writebehind has some positive effect of around 30s runtime (drops from
around 5 mins to 4:30 mins).
Unfortunately I need a further improvement down to around 4 mins runtime,
because now every now and then the 5 mins barrier is hit.

Remember that the remote2 server is down in this setup. We use only remote1
currently.
Do you have any ideas how to improve performance?

> -Shehjar

-- 
Regards,
Stephan