We used tc. Not that I've ever used tc before, so it was a bit of guesswork, but ... where $interface is the main network interface: --8<-- tc qdisc del dev $interface root tc qdisc add dev $interface root handle 1: cbq avpkt 1000 bandwidth 1000mbit tc class add dev $interface parent 1: classid 1:1 cbq rate 20mbit allot 1500 prio 5 bounded isolated for otherIP in <other server IP addresses> do tc filter add dev $interface parent 1: protocol ip prio 16 u32 match ip dst $otherIP flowid 1:1 done --8<-- Cheers, Kingsley. On Tue, 2015-02-24 at 18:12 +0530, Ravishankar N wrote: > On 02/24/2015 04:46 PM, Kingsley wrote: > > When testing gluster, I found similar issues when I simulated a brick > > failure on a replicated volume - while it was rebuilding the newly > > replaced brick, the volume was very unresponsive. > > > > Our bricks are on SATA drives and the server LAN runs at 1Gbps. The > > disks couldn't cope with the IOPS that the network was throwing at them. > > > > I solved that particular issue by using traffic shaping to limit the > > network bandwidth that the servers could use between each other (but not > > limiting it to anywhere else). The volume took longer to rebuild the > > replaced brick, but the volume was still responsive to clients during > > the rebuild. > > > > Please let me know if what we tried is a bad idea ... > The self-heal daemon (shd) which does the heals also runs on the > servers. It is basically a process that loads some of the client side > xlators so that it has a cluster view. It then connects to the bricks > like a normal client and does the heals from the source to the sink. So > limiting that bandwidth between the shd and the bricks so that the > 'real' clients can connect to the bricks seems to support your findings. > > But what exactly did you do to limit the bandwidth? The gluster nfs > server process also resides on the brick nodes. So maybe limiting the > bandwidth between that and the brick processes would slow down nfs > clients as well. > > Also, what version of gluster did you try this on? Beginning with 3.6, > AFR has granular entry self-heals. Before this, (i.e. 3.5 and less) AFR > used to take a full-lock on the directory and clients could not modify > the directory contents until the heal was complete. > > Thanks, > Ravi > > Cheers, > > Kingsley. > > > > On Tue, 2015-02-24 at 07:11 +0530, Ravishankar N wrote: > >> On 02/24/2015 05:00 AM, Craig Yoshioka wrote: > >>> Iʼm using Gluster 3.6 to host a volume with some KVM images. Iʼd seen before that other people were having terrible performance while Gluster was auto-healing but that a rewrite in 3.6 had potentially solved this problem. > >>> > >>> Well, it hasnʼt (for me). If my gluster volume starts to auto-heal, performance can get so bad that some of the VMs essentially lock up. In top I can see the glusterfsd process sometime hitting 700% of the CPU. Is there anything I can do to prevent this by throttling the healing process? > >> For VM workloads, you could set the 'cluster.data-self-heal-algorithm' > >> option to 'full'. The checksum computation in the 'diff' algorithm can > >> be cpu intensive, especially since VM images are big files. > >> > >> [root@tuxpad glusterfs]# gluster v set help|grep algorithm > >> Option: cluster.data-self-heal-algorithm > >> Description: Select between "full", "diff". The "full" algorithm copies > >> the entire file from source to sink. The "diff" algorithm copies to sink > >> only those blocks whose checksums don't match with those of source. If > >> no option is configured the option is chosen dynamically as follows: If > >> the file does not exist on one of the sinks or empty file exists or if > >> the source file size is about the same as page size the entire file will > >> be read and written i.e "full" algo, otherwise "diff" algo is chosen. > >> > >> Hope this helps. > >> Ravi > >> > >>> Here are my volume options: > >>> > >>> Volume Name: vm-images > >>> Type: Replicate > >>> Volume ID: 5b38ddbe-a1ae-4e10-b0ad-dcd785a44493 > >>> Status: Started > >>> Number of Bricks: 1 x 2 = 2 > >>> Transport-type: tcp > >>> Bricks: > >>> Brick1: vmhost-1:/gfs/brick-0 > >>> Brick2: vmhost-2:/gfs/brick-0 > >>> Options Reconfigured: > >>> nfs.disable: on > >>> cluster.quorum-count: 1 > >>> network.frame-timeout: 1800 > >>> network.ping-timeout: 15 > >>> server.allow-insecure: on > >>> storage.owner-gid: 36 > >>> storage.owner-uid: 107 > >>> performance.quick-read: off > >>> performance.read-ahead: off > >>> performance.io-cache: off > >>> performance.stat-prefetch: off > >>> cluster.eager-lock: enable > >>> network.remote-dio: enable > >>> cluster.quorum-type: fixed > >>> cluster.server-quorum-type: server > >>> cluster.server-quorum-ratio: 51% > >>> > >>> Thanks! > >>> -Craig > >>> _______________________________________________ > >>> Gluster-users mailing list > >>> Gluster-users@xxxxxxxxxxx > >>> http://www.gluster.org/mailman/listinfo/gluster-users > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users@xxxxxxxxxxx > >> http://www.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users