On February 25, 2016 8:32:44 PM PST, Kyle Maas <kyle@xxxxxxxxxxxxxxxxxxxxxxx> wrote: >On 02/25/2016 08:20 PM, Ravishankar N wrote: >> On 02/25/2016 11:36 PM, Kyle Maas wrote: >>> How can I tell what AFR version a cluster is using for self-heal? >> If all your servers and clients are 3.7.8, then they are by default >> running afr-v2. Afr-v2 was a re-write of afr that went in for 3.6., >> so any gluster package from then on has this code, you don't need to >> explicitly enable anything. > >That was what I thought until I ran across this IRC log where JoeJulian >asked if it was explicitly enabled: > >https://irclog.perlgeek.de/gluster/2015-10-29 > A couple lines down, though, i continued "Ah, I was confusing that with nsr." >>> >>> The reason I ask is that I have a two-node replicated 3.7.8 cluster >(no >>> arbiters) which has locking behavior during self-heal which looks >very >>> similar to that of AFRv1 (only heals one file at a time per >self-heal >>> daemon, appears to lock the full inode while it's healing it instead >of >>> just ranges, etc.), >> Both v1 and v2 use range locks while healing a given file, so >clients >> shouldn't block when heals happen. What is the problem you're facing? >> Are your clients also at 3.7.8? > >Primary symptoms are: > >1. While a self-heal is running, only one file at a time is healed per >brick. As I understand it, AFRv2 and up should allow for multiple >files >to be healed concurrently or at least multiple ranges within a file, >particularly with io-thread-count set to >1. During a self-heal, >neither I/O nor network is saturated, which leads me to believe that >I'm >looking at a single synchronous self-healing process. > >3. More troubling is that during a self-heal, clients cannot so much as >list the files on the volume until the self-heal is done. No errors. >No timeouts. They just freeze. As soon as the self-heal is complete, >they unfreeze and list the contents. > >4. Any file access during a self-heal also freezes, just like a >directory listing, until the self-heal is done. This wreaks havoc on >users who have files open when one of the bricks is rebooted and has to >be healed, since with as much data is stored on this cluster, a >self-heal can take almost 24 hours. > >I experience the same problems when I run without any clients other >than >the bricks themselves mounting the volume, so yes, it happens with the >clients on 3.7.8 as well. > >Warm Regards, >Kyle Maas > >_______________________________________________ >Gluster-users mailing list >Gluster-users@xxxxxxxxxxx >http://www.gluster.org/mailman/listinfo/gluster-users -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users