another look at high concurrency and cpu usage

raghavendra at gluster.com (Raghavendra G) · Tue, 16 Feb 2010 09:57:51 +0400

Also, 3.0.x has improvements related to io-cache.

Are you observing high cpu usage on client side or server side? If it is on
client side, can you remove all performance translators and observe whether
you still face the same problem? If removing performance translators solves
the problem of high cpu usage, is it possible to zero in on particular
translator (write-behind, io-cache, read-ahead etc) by adding and removing
them from configuration? I am suspecting io-cache to be the culprit.

regards,
On Mon, Feb 15, 2010 at 11:44 PM, Harshavardhana <harsha at gluster.com> wrote:

> Hi John,
>
> * replies inline *
> On Tue, Feb 16, 2010 at 12:43 AM, John Madden <jmadden at ivytech.edu> wrote:
>
> > I've made a few swings at using glusterfs for the php session store for a
> > heavily-used web app (~6 million pages daily) and I've found time and
> again
> > that cpu usage and odd load characteristics cause glusterfs to be
> entirely
> > unsuitable for this use case at least given my configuration. I posted on
> > this earlier, but I'm hoping I can get some input on this as things are
> way
> > better than they were but still not good enough.  I'm on v2.0.9 as the
> 3.0.x
> > series doesn't seem to be fully settled yet, though feel free to correct
> me
> > on that.
> >
> > I have a two-nodes replicate setup and four clients.  Configs are below.
> >  What I see is that one brick gets pegged (load avg of 8) and the other
> > sites much more idle (load avg of 1).  The pegged node ends up with high
> run
> > queues and i/o blocked processes.  CPU usage on the clients for the
> > glusterfs processes gets pretty high, consuming at least an entire cpu
> when
> > not spiking to consume both.  I have very high thread counts on the
> clients
> > to hopefully avoid thread waits on i/o requests.  All six machines are
> > identical xen instances.
> >
> >
> Comments about your vol file are as below:
>
> 1. write-behind cache-size of 128MB is an over kill having so much
> aggressiveness
> over an ethernet will not get you good performance.
> 2. thread count of 100 is way beyond what is the actual use case, in our
> tests and deployments it is seen that having 16 thread cater almost all the
> cases.
> 3. quick-read and stat-pretech will help if you have smaller files and
> large
> number of them. 3.0.2 has proper enhancements for getting this
> functionality.
>
> Suggestion is to divide "fsd.vol" into two vol files so having each server
> for each backend export. It has been seen that in production deployments
> this helps in scalability and performance gains.
>
> Also using "glusterfs-volgen" generated volume files are better for all
> your
> needs.
>
> When one of the bricks is down, cpu usage across the board goes way down,
> > interactivity goes way up, and things seem overall to be a whole lot
> better.
> >  Why is that?  I would think that having two nodes would at least result
> in
> > better read rates.
> >
> > I've gone through various caching schemes and tried readahead,
> writebehind,
> > quick-read, and stat-prefetch.  I found quick-read caused a ton of memory
> > consumption and didn't help on performance.  I didn't see much of a
> change
> > at all with stat-prefetch.
> >
> > ...Any thoughts?
> >
> > ### fsd.vol:
> >
> > volume sessions
> >  type storage/posix
> >  option directory /var/glusterfs/sessions
> >  option o-direct off
> > end-volume
> > volume data
> >  type storage/posix
> >  option directory /var/glusterfs/data
> >  option o-direct off
> > end-volume
> > volume locks0
> >  type features/locks
> >  option mandatory-locks on
> >  subvolumes data
> > end-volume
> > volume locks1
> >  type features/locks
> >  option mandatory-locks on
> >  subvolumes sessions
> > end-volume
> > volume brick0
> >  type performance/io-threads
> >  option thread-count 32 # default is 16
> >  subvolumes locks0
> > end-volume
> > volume brick1
> >  type performance/io-threads
> >  option thread-count 32 # default is 16
> >  subvolumes locks1
> > end-volume
> > volume server
> >  type protocol/server
> >  option transport-type tcp
> >  option transport.socket.nodelay on
> >  subvolumes brick0 brick1
> >  option auth.addr.brick0.allow ip's...
> >  option auth.addr.brick1.allow ip's...
> > end-volume
> >
> >
> > ### client.vol (just one connection shown here)
> >
> > volume glusterfs0-hs
> >  type protocol/client
> >  option transport-type tcp
> >  option remote-host "ip"
> >  option ping-timeout 10
> >  option transport.socket.nodelay on
> >  option remote-subvolume brick1
> > end-volume
> > volume glusterfs1-hs
> >  type protocol/client
> >  option transport-type tcp
> >  option remote-host "ip"
> >  option ping-timeout 10server for each request
> >  option transport.socket.nodelay onspeed
> >  option remote-subvolume brick1
> > end-volume
> > volume replicated
> >  type cluster/replicate
> >  subvolumes glusterfs0-hs glusterfs1-hs
> > end-volume
> > volume iocache
> >  type performance/io-cache
> >  option cache-size 512MB
> >  option cache-timeout 10
> >  subvolumes replicated
> > end-volume
> > volume writeback
> >  type performance/write-behind
> >  option cache-size 128MB
> >  option flush-behind off
> >  subvolumes iocache
> > end-volume
> > volume iothreads
> >  type performance/io-threads
> >  option thread-count 100
> >  subvolumes writeback
> > end-volume
> >
> >
> >
> >
> >
> > --
> > John Madden
> > Sr UNIX Systems Engineer
> > Ivy Tech Community College of Indiana
> > jmadden at ivytech.edu
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>

-- 
Raghavendra G