Re: AFR Version used for self-heal

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Thu, 25 Feb 2016 22:01:05 -0800



On February 25, 2016 8:32:44 PM PST, Kyle Maas <kyle@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
>On 02/25/2016 08:20 PM, Ravishankar N wrote:
>> On 02/25/2016 11:36 PM, Kyle Maas wrote:
>>> How can I tell what AFR version a cluster is using for self-heal?
>> If all your servers and clients are 3.7.8, then they are by default
>> running afr-v2.  Afr-v2 was a re-write of afr that went in for 3.6.,
>> so any gluster package from then on has this code, you don't need to
>> explicitly enable anything.
>
>That was what I thought until I ran across this IRC log where JoeJulian
>asked if it was explicitly enabled:
>
>https://irclog.perlgeek.de/gluster/2015-10-29
>


A couple lines down, though, i continued "Ah, I was confusing that with nsr."

>>>
>>> The reason I ask is that I have a two-node replicated 3.7.8 cluster
>(no
>>> arbiters) which has locking behavior during self-heal which looks
>very
>>> similar to that of AFRv1 (only heals one file at a time per
>self-heal
>>> daemon, appears to lock the full inode while it's healing it instead
>of
>>> just ranges, etc.),
>>  Both v1 and v2 use range locks while healing a given file, so
>clients
>> shouldn't block when heals happen. What is the problem you're facing?
>> Are your clients also at 3.7.8?
>
>Primary symptoms are:
>
>1. While a self-heal is running, only one file at a time is healed per
>brick.  As I understand it, AFRv2 and up should allow for multiple
>files
>to be healed concurrently or at least multiple ranges within a file,
>particularly with io-thread-count set to >1.  During a self-heal,
>neither I/O nor network is saturated, which leads me to believe that
>I'm
>looking at a single synchronous self-healing process.
>
>3. More troubling is that during a self-heal, clients cannot so much as
>list the files on the volume until the self-heal is done.  No errors. 
>No timeouts.  They just freeze.  As soon as the self-heal is complete,
>they unfreeze and list the contents.
>
>4. Any file access during a self-heal also freezes, just like a
>directory listing, until the self-heal is done.  This wreaks havoc on
>users who have files open when one of the bricks is rebooted and has to
>be healed, since with as much data is stored on this cluster, a
>self-heal can take almost 24 hours.
>
>I experience the same problems when I run without any clients other
>than
>the bricks themselves mounting the volume, so yes, it happens with the
>clients on 3.7.8 as well.
>
>Warm Regards,
>Kyle Maas
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users@xxxxxxxxxxx
>http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users