Re: performance issue

"Matt Drew" <matt.drew@xxxxxxxxx> · Wed, 16 Jan 2008 07:02:56 -0500

That seems to have made it slightly worse.  The concurrent "ls -l"s
seemed better, but the web server execution was slightly slower.

On Jan 14, 2008 6:41 PM, Anand Avati <avati@xxxxxxxxxxxxx> wrote:
> Matt,
>  for the sake of diagnosing, can you try specifying "option self-heal off"
> in cluster/unify volume and try as well?
>
> thanks,
>
>
> avati
>
> 2008/1/14, Matt Drew < matt.drew@xxxxxxxxx>:
> > Avati,
> >
> > I tried values of 2, 4, 6, 30, 60, and 120 for -e and -a with no
> > measurable effect.  We're not using AFR yet so there's no issue there.
> >
> > On Jan 14, 2008 12:05 AM, Anand Avati <avati@xxxxxxxxxxxxx> wrote:
> > > Matt,
> > >  we are currently investigating AFR + io-cache performance issues
> (io-cache
> > > not really making full use of caching when AFR does load balanced
> reads).
> > > You could override AFR read scheduling by specifying 'option
> read-subvolume
> > > <first subvolume>' in AFR as a temporary workaround. That apart, I
> suggest
> > > you mount glusterfs with large -e and -a argument values which should
> > > improve performance in such cases quite a bit. Do let us know if that
> made
> > > any difference.
> > >
> > > avati
> > >
> > > 2008/1/14, Matt Drew <matt.drew@xxxxxxxxx>:
> > > >
> > > >
> > > >
> > > > I've been digging into a seemingly difficult performance issues over
> > > > the last few days.  We're running glusterfs mainline 2.5 patch 628,
> > > > fuse-2.7.2 with Marian's glfs patch, kernel 2.6.23, currently one
> > > > server and two clients (soon to be two and four, respectively).  The
> > > > server is a dual-core Opteron with a SATA2 disk (one, we're planning
> > > > on AFR redundancy), the clients are dual-core Intel machines. The
> > > > network transport is gigabit ethernet.  The server is 32-bit and the
> > > > clients are 64-bit (I can rebuild the server no problem if that is the
> > > > issue).  Throughput is good, and activity by one process seems to work
> > > > fine.
> > > >
> > > > Our issue is with a PHP script running on the client via the glusterfs
> > > > share.  The script has a number of includes, and those files have a
> > > > few more includes.  This means a lot of stats as the webserver checks
> > > > to make sure none of the files have changed.  If we make one call to
> > > > the script, everything is fine - the code completes in 300ms.
> > > > Similarly, if you run "ls -l" on a large directory (1700 files)
> > > > everything appears to work fine (from local disk the code completes in
> > > > 100ms).
> > > >
> > > > However, if we make two concurrent calls to the PHP script, or run two
> > > > copies of ls -l on the large directory, everything slows down by an
> > > > order of magnitude.  The output of the ls commands appears to stutter
> > > > on each copy - usually one will stop and the other will start, but
> > > > sometimes both will stop for a second or two.  Adding a third process
> > > > makes it worse.  The PHP script takes 2.5 or 3 seconds to complete,
> > > > instead of 300ms, and again more requests makes it worse - if you
> > > > request four operations concurrently, the finish time jumps to 7
> > > > seconds.  This issue occurs whether you are on a single client with
> > > > two processes, or if you are on two clients with one process each.
> > > >
> > > > Inserting the trace translator doesn't turn up anything unusual that I
> > > > can see, with the exception that it makes the processes run even
> > > > slower (which is expected, of course).  A tcpdump of the filesystem
> > > > traffic shows inexplicable gaps of 100ms or more with no traffic.  The
> > > > single process "ls -l" test does not show these gaps.
> > > >
> > > > I stripped the server and client to the bare minimum with unify.  This
> > > > didn't seem to make a difference.  I'm currently running this
> > > > server/client stack, also without success:
> > > >
> > > > ns
> > > > brick (x2)
> > > > posix-locks
> > > > io-threads(16, 64MB)
> > > > server (ns, brick1, brick2)
> > > >
> > > > brick1
> > > > brick2
> > > > unify(alu)
> > > > io-threads(16, 64MB)
> > > > io-cache(256MB)
> > > >
> > > > At various times I've tried read-ahead with no discernable difference.
> > > > An strace of the client process doesn't return anything interesting
> > > > except a lot of these:
> > > >
> > > > futex(0x12345678, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource
> > > > temporarily unavailable)
> > > >
> > > > These also appear during a single process test, but they are much more
> > > > prevalent when two processes are running.
> > > >
> > > > What am I doing wrong? :)
> > > >
> > > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxx
> > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > >
> > >
> > >
> > >
> > > --
> > > If I traveled to the end of the rainbow
> > > As Dame Fortune did intend,
> > > Murphy would be there to tell me
> > >  The pot's at the other end.
> >
>
>
>
> --
>
>
> If I traveled to the end of the rainbow
> As Dame Fortune did intend,
> Murphy would be there to tell me
> The pot's at the other end.