Re: Gluster-users Digest, Vol 74, Issue 11

Jorick Astrego <j.astrego@xxxxxxxxxxx> · Tue, 10 Jun 2014 23:15:15 +0200

    On 06/10/2014 02:00 PM,
      gluster-users-request@xxxxxxxxxxx wrote:

      From: Laurent Chouinard <laurent.chouinard@xxxxxxxxxxx>
To: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Unavailability during self-heal for large
                 volumes
Message-ID:
 <95ea1865fac2484980d020c6a3b7f0cd@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

        > Laurent,
>    This has been improved significantly in afr-v2 (enhanced version of 

      replication

        > translator in gluster) which will be released with 3.6 I believe. The 

      issue happens 

        > because of the directory self-heal in the older versions. In the new 

      version per file

        > healing in a directory is performed instead of Full directory heal 

      at-once which was

        > creating a lot of traffic. Unfortunately This is too big a change to 

      backport to older 

        > releases :-(.
>
> Pranith

      Hi Pranith,

Thank you for this information. 

Do you think there is a way to limit/throttle the current directory 
self-heal then? I don't mind if it takes a long time.

Alternatively, is there a way to completely disable the complete healing 
system? I would consider running a manual healing operation by STAT'ing 
every file, which would allow me to throttle the speed to a more 
manageable level.

Thanks,

Laurent Chouinard

    You could try this:

    http://www.gluster.org/author/andrew-lau/

    by Andrew Lau on February 3, 2014
    Controlling
        glusterfsd CPU outbreaks with cgroups
    Some of you may that same feeling when adding a new brick to your
      gluster replicated volume which already has an excess of 1TB data
      already on there and suddenly your gluster server has shot up to
      500% CPU usage. What's worse is when my hosts run along side oVirt
      so while gluster hogged all the CPU, my VMs started to crawl, even
      running simple commands like top would take 30+
      seconds. Not a good feeling.
    My first attempt I limited the NIC's bandwidth to 200Mbps rather
      than the 2x1Gbps aggregated link and this calmed glusterfsd down
      to a healthy 50%. A temporary fix which however meant clients
      accessing gluster storage would be bottlenecked by that shared
      limit.
    So off to the mailing list - a great suggestion from
      James/purpleidea (https://ttboj.wordpress.com/code/puppet-gluster/)
      on using cgroups.
    The concept is simple, we limit the total CPU glusterfsd sees so
      when it comes to doing the checksums for self heals, replication
      etc. They won't have the high priority which other services such
      as running VMs would have. This effectively slows down
        replication rate in return for lower CPU usage.

    Kind regards,

    Jorick Astrego

    Netbulae B.V.

    http://www.netbulae.eu

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: Gluster-users Digest, Vol 74, Issue 11

Controlling glusterfsd CPU outbreaks with cgroups