Re: poor performance during healing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 02/24/2015 04:46 PM, Kingsley wrote:
When testing gluster, I found similar issues when I simulated a brick
failure on a replicated volume - while it was rebuilding the newly
replaced brick, the volume was very unresponsive.

Our bricks are on SATA drives and the server LAN runs at 1Gbps. The
disks couldn't cope with the IOPS that the network was throwing at them.

I solved that particular issue by using traffic shaping to limit the
network bandwidth that the servers could use between each other (but not
limiting it to anywhere else). The volume took longer to rebuild the
replaced brick, but the volume was still responsive to clients during
the rebuild.

Please let me know if what we tried is a bad idea ...
The self-heal daemon (shd) which does the heals also runs on the servers. It is basically a process that loads some of the client side xlators so that it has a cluster view. It then connects to the bricks like a normal client and does the heals from the source to the sink. So limiting that bandwidth between the shd and the bricks so that the 'real' clients can connect to the bricks seems to support your findings.

But what exactly did you do to limit the bandwidth? The gluster nfs server process also resides on the brick nodes. So maybe limiting the bandwidth between that and the brick processes would slow down nfs clients as well.

Also, what version of gluster did you try this on? Beginning with 3.6, AFR has granular entry self-heals. Before this, (i.e. 3.5 and less) AFR used to take a full-lock on the directory and clients could not modify the directory contents until the heal was complete.

Thanks,
Ravi
Cheers,
Kingsley.

On Tue, 2015-02-24 at 07:11 +0530, Ravishankar N wrote:
On 02/24/2015 05:00 AM, Craig Yoshioka wrote:
Iʼm using Gluster 3.6 to host a volume with some KVM images.  Iʼd seen before that other people were having terrible performance while Gluster was auto-healing but that a rewrite in 3.6 had potentially solved this problem.

Well, it hasnʼt (for me).  If my gluster volume starts to auto-heal, performance can get so bad that some of the VMs essentially lock up.  In top I can see the glusterfsd process sometime hitting 700% of the CPU.  Is there anything I can do to prevent this by throttling the healing process?
For VM workloads, you could set the 'cluster.data-self-heal-algorithm'
option to 'full'. The checksum computation in the 'diff' algorithm can
be cpu intensive, especially since VM images are big files.

[root@tuxpad glusterfs]# gluster v set help|grep algorithm
Option: cluster.data-self-heal-algorithm
Description: Select between "full", "diff". The "full" algorithm copies
the entire file from source to sink. The "diff" algorithm copies to sink
only those blocks whose checksums don't match with those of source. If
no option is configured the option is chosen dynamically as follows: If
the file does not exist on one of the sinks or empty file exists or if
the source file size is about the same as page size the entire file will
be read and written i.e "full" algo, otherwise "diff" algo is chosen.

Hope this helps.
Ravi

Here are my volume options:

Volume Name: vm-images
Type: Replicate
Volume ID: 5b38ddbe-a1ae-4e10-b0ad-dcd785a44493
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: vmhost-1:/gfs/brick-0
Brick2: vmhost-2:/gfs/brick-0
Options Reconfigured:
nfs.disable: on
cluster.quorum-count: 1
network.frame-timeout: 1800
network.ping-timeout: 15
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 107
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: fixed
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51%

Thanks!
-Craig
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux