Re: poor performance during healing

Kingsley <gluster@xxxxxxxxxxxxxxxxxxx> · Tue, 24 Feb 2015 12:53:20 +0000

We used tc. Not that I've ever used tc before, so it was a bit of
guesswork, but ...

where $interface is the main network interface:

--8<--
tc qdisc del dev $interface root
tc qdisc add dev $interface root handle 1: cbq avpkt 1000 bandwidth 1000mbit
tc class add dev $interface parent 1: classid 1:1 cbq rate 20mbit allot 1500 prio 5 bounded isolated

for otherIP in <other server IP addresses>
do
    tc filter add dev $interface parent 1: protocol ip prio 16 u32 match ip dst $otherIP flowid 1:1
done
--8<--

Cheers,
Kingsley.

On Tue, 2015-02-24 at 18:12 +0530, Ravishankar N wrote:
> On 02/24/2015 04:46 PM, Kingsley wrote:
> > When testing gluster, I found similar issues when I simulated a brick
> > failure on a replicated volume - while it was rebuilding the newly
> > replaced brick, the volume was very unresponsive.
> >
> > Our bricks are on SATA drives and the server LAN runs at 1Gbps. The
> > disks couldn't cope with the IOPS that the network was throwing at them.
> >
> > I solved that particular issue by using traffic shaping to limit the
> > network bandwidth that the servers could use between each other (but not
> > limiting it to anywhere else). The volume took longer to rebuild the
> > replaced brick, but the volume was still responsive to clients during
> > the rebuild.
> >
> > Please let me know if what we tried is a bad idea ...
> The self-heal daemon (shd) which does the heals also runs on the 
> servers.  It is basically a process that loads some of the client side 
> xlators so that it has a cluster view. It then connects to the bricks 
> like a normal client and does the heals from the source to the sink. So 
> limiting that bandwidth between the shd and the bricks so that the 
> 'real' clients can connect to the bricks seems to support your findings.
> 
> But what exactly did you do to limit the bandwidth? The gluster nfs 
> server process also resides on the  brick nodes. So maybe limiting the 
> bandwidth between that and the brick processes would slow down nfs 
> clients as well.
> 
> Also, what version of gluster did you try this on? Beginning with 3.6, 
> AFR has granular entry self-heals. Before this, (i.e. 3.5 and less) AFR 
> used to take a full-lock on the directory and clients could not modify 
> the directory contents until the heal was complete.
> 
> Thanks,
> Ravi
> > Cheers,
> > Kingsley.
> >
> > On Tue, 2015-02-24 at 07:11 +0530, Ravishankar N wrote:
> >> On 02/24/2015 05:00 AM, Craig Yoshioka wrote:
> >>> Iʼm using Gluster 3.6 to host a volume with some KVM images.  Iʼd seen before that other people were having terrible performance while Gluster was auto-healing but that a rewrite in 3.6 had potentially solved this problem.
> >>>
> >>> Well, it hasnʼt (for me).  If my gluster volume starts to auto-heal, performance can get so bad that some of the VMs essentially lock up.  In top I can see the glusterfsd process sometime hitting 700% of the CPU.  Is there anything I can do to prevent this by throttling the healing process?
> >> For VM workloads, you could set the 'cluster.data-self-heal-algorithm'
> >> option to 'full'. The checksum computation in the 'diff' algorithm can
> >> be cpu intensive, especially since VM images are big files.
> >>
> >> [root@tuxpad glusterfs]# gluster v set help|grep algorithm
> >> Option: cluster.data-self-heal-algorithm
> >> Description: Select between "full", "diff". The "full" algorithm copies
> >> the entire file from source to sink. The "diff" algorithm copies to sink
> >> only those blocks whose checksums don't match with those of source. If
> >> no option is configured the option is chosen dynamically as follows: If
> >> the file does not exist on one of the sinks or empty file exists or if
> >> the source file size is about the same as page size the entire file will
> >> be read and written i.e "full" algo, otherwise "diff" algo is chosen.
> >>
> >> Hope this helps.
> >> Ravi
> >>
> >>> Here are my volume options:
> >>>
> >>> Volume Name: vm-images
> >>> Type: Replicate
> >>> Volume ID: 5b38ddbe-a1ae-4e10-b0ad-dcd785a44493
> >>> Status: Started
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: vmhost-1:/gfs/brick-0
> >>> Brick2: vmhost-2:/gfs/brick-0
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> cluster.quorum-count: 1
> >>> network.frame-timeout: 1800
> >>> network.ping-timeout: 15
> >>> server.allow-insecure: on
> >>> storage.owner-gid: 36
> >>> storage.owner-uid: 107
> >>> performance.quick-read: off
> >>> performance.read-ahead: off
> >>> performance.io-cache: off
> >>> performance.stat-prefetch: off
> >>> cluster.eager-lock: enable
> >>> network.remote-dio: enable
> >>> cluster.quorum-type: fixed
> >>> cluster.server-quorum-type: server
> >>> cluster.server-quorum-ratio: 51%
> >>>
> >>> Thanks!
> >>> -Craig
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users@xxxxxxxxxxx
> >>> http://www.gluster.org/mailman/listinfo/gluster-users
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users@xxxxxxxxxxx
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users