Re: Copy operation freezes. Lots of locks in state BLOCKED (3-node setup with 1 arbiter)

Adrian Gruntkowski <adrian.gruntkowski@xxxxxxxxx> · Wed, 4 Nov 2015 16:40:50 +0100

Hello,
I have applied Pranith's patch myself on current 3.7.5 release and rebuilt packages. Unfortunately, the issue is still there :( It behaves exactly the same.

Regards,
Adrian

2015-10-28 12:02 GMT+01:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

    On 10/28/2015 04:27 PM, Adrian
      Gruntkowski wrote:

      Hello Pranith,

        Thank you for prompt reaction. I didn't get back to this
          until now, because I had other problems to deal with.

        Are there chances that it will get released this or next
          month? If not, I will probably have to resort to compiling on
          my own.

    I am planning to get this in for 3.7.6 which is to be released by
    end of this month. I guess in 4-5 days :-). I will update you

    Pranith

        Regards,
        Adrian

        2015-10-26 12:37 GMT+01:00 Pranith
          Kumar Karampuri <pkarampu@xxxxxxxxxx>:

                  On 10/23/2015 10:10 AM, Ravishankar N wrote:

                    On 10/21/2015 05:55 PM, Adrian Gruntkowski
                      wrote:

                      Hello,

                        I'm trying to track down a problem with my setup
                        (version 3.7.3 on Debian stable).

                        I have a couple of volumes setup in 3-node
                        configuration with 1 brick as an arbiter for
                        each. 

                        There are 4 volumes set up in cross-over across
                        3 physical servers, like this:

                         ------------------------------------->[
                        GigabitEthernet switch
                        ]<--------------------------

                                     |                                  
                                     ^                                  
                             |

                                     |                                  
                                     |                                  
                             |

                                     V                                  
                                     V                                  
                             V

                        /-------------------------- \                  
                        /-------------------------- \            
                        /-------------------------- \

                        | web-rep                   |                  
                        | cluster-rep               |             |
                        mail-rep                  |

                        |                           |                  
                        |                           |             |    
                                              |

                        | vols:                     |                  
                        | vols:                     |             |
                        vols:                     |

                        | system_www1               |                  
                        | system_www1               |             |
                        system_www1(arbiter)      |

                        | data_www1                 |                  
                        | data_www1                 |             |
                        data_www1(arbiter)        |

                        | system_mail1(arbiter)     |                  
                        | system_mail1              |             |
                        system_mail1              |

                        | data_mail1(arbiter)       |                  
                        | data_mail1                |             |
                        data_mail1                |

                        \---------------------------/                  
                        \---------------------------/            
                        \---------------------------/

                        Now, after a fresh boot-up, everything seems to
                        be running fine.

                        Then I start copying big files (KVM disk images)
                        from local disk to gluster mounts.

                        In the beginning it seems to be running fine
                        (although iowait seems go so high that it clogs
                        up io operations

                        at some moments, but that's an issue for later).
                        After some time the transfer freezes, then

                        after some (long) time, it advances in a short
                        burst to freeze again. Another interesting thing
                        is that

                        I see constant flow of the network traffic on
                        interfaces dedicated to gluster, even when
                        there's a "freeze".

                        I have done "gluster volume statedump" at that
                        time of transfer (file is copied from local disk
                        on cluster-rep

                        onto local mount of "system_www1" volume). I've
                        observer a following section in the dump for
                        cluster-rep node:

                        [xlator.features.locks.system_www1-locks.inode]

                        path=/images/101/vm-101-disk-1.qcow2

                        mandatory=0

                        inodelk-count=12

lock-dump.domain.domain=system_www1-replicate-0:self-heal

                        inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
                        start=0, len=0, pid = 18446744073709551610,
                        owner=c811600cd67f0000, client=0x7fbe100df280,
                        connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,

                        granted at 2015-10-21 11:36:22

                        lock-dump.domain.domain=system_www1-replicate-0

                        inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
                        start=2195849216, len=131072, pid =
                        18446744073709551610, owner=c811600cd67f0000,
                        client=0x7fbe100df280,
                        connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,

                        granted at 2015-10-21 11:37:45

                        inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0,
                        start=9223372036854775805, len=1, pid =
                        18446744073709551610, owner=c811600cd67f0000,
                        client=0x7fbe100df280,
                        connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,

                        granted at 2015-10-21 11:36:22

                    From the statedump, It looks like self-heal daemon
                    had taken locks to heal the file due to which the
                    locks attempted by the client (mount) are in blocked
                    state.

                    In Arbiter volumes the client (mount) takes full
                    locks (start=0, len=0) for every write() as opposed
                    to normal replica volumes which take range locks
                    (i.e. appropriate start,len values) for that
                    write(). This is done to avoid network split-brains.

                    So in normal replica volumes, clients can still
                    write to a file while heal is going on, as long as
                    the offsets don't overlap. This is not the case with
                    arbiter volumes.

                    You can look at the client or glustershd logs to see
                    if there are messages that indicate healing of a
                    file, something along the lines of "Completed data
                    selfheal on xxx"

              hi Adrian,

                    Thanks for taking the time to send this mail. I
              raised this as bug @https://bugzilla.redhat.com/show_bug.cgi?id=1275247,
              fix is posted for review @ http://review.gluster.com/#/c/12426/

                  Pranith

                      inodelk.inodelk[2](BLOCKED)=type=WRITE,
                        whence=0, start=0, len=0, pid = 0,
                        owner=c4fd2d78487f0000, client=0x7fbe100e1380,
                        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

                        blocked at 2015-10-21 11:37:45

                        inodelk.inodelk[3](BLOCKED)=type=WRITE,
                        whence=0, start=0, len=0, pid = 0,
                        owner=dc752e78487f0000, client=0x7fbe100e1380,
                        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

                        blocked at 2015-10-21 11:37:45

                        inodelk.inodelk[4](BLOCKED)=type=WRITE,
                        whence=0, start=0, len=0, pid = 0,
                        owner=34832e78487f0000, client=0x7fbe100e1380,
                        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

                        blocked at 2015-10-21 11:37:45

                        inodelk.inodelk[5](BLOCKED)=type=WRITE,
                        whence=0, start=0, len=0, pid = 0,
                        owner=d44d2e78487f0000, client=0x7fbe100e1380,
                        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

                        blocked at 2015-10-21 11:37:45

                        inodelk.inodelk[6](BLOCKED)=type=WRITE,
                        whence=0, start=0, len=0, pid = 0,
                        owner=306f2e78487f0000, client=0x7fbe100e1380,
                        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

                        blocked at 2015-10-21 11:37:45

                        inodelk.inodelk[7](BLOCKED)=type=WRITE,
                        whence=0, start=0, len=0, pid = 0,
                        owner=8c902e78487f0000, client=0x7fbe100e1380,
                        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

                        blocked at 2015-10-21 11:37:45

                        inodelk.inodelk[8](BLOCKED)=type=WRITE,
                        whence=0, start=0, len=0, pid = 0,
                        owner=782c2e78487f0000, client=0x7fbe100e1380,
                        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

                        blocked at 2015-10-21 11:37:45

                        inodelk.inodelk[9](BLOCKED)=type=WRITE,
                        whence=0, start=0, len=0, pid = 0,
                        owner=1c0b2e78487f0000, client=0x7fbe100e1380,
                        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

                        blocked at 2015-10-21 11:37:45

                        inodelk.inodelk[10](BLOCKED)=type=WRITE,
                        whence=0, start=0, len=0, pid = 0,
                        owner=24332e78487f0000, client=0x7fbe100e1380,
                        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

                        blocked at 2015-10-21 11:37:45

                        There seem to be multiple locks in BLOCKED state
                        - which doesn't look normal to me. The other 2
                        nodes have

                        only 2 ACTIVE locks at the same time.

                        Below is "gluster volume info" output.

                        # gluster volume info

                        Volume Name: data_mail1

                        Type: Replicate

                        Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c

                        Status: Started

                        Number of Bricks: 1 x 3 = 3

                        Transport-type: tcp

                        Bricks:

                        Brick1: cluster-rep:/GFS/data/mail1

                        Brick2: mail-rep:/GFS/data/mail1

                        Brick3: web-rep:/GFS/data/mail1

                        Options Reconfigured:

                        performance.readdir-ahead: on

                        cluster.quorum-count: 2

                        cluster.quorum-type: fixed

                        cluster.server-quorum-ratio: 51%

                        Volume Name: data_www1

                        Type: Replicate

                        Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026

                        Status: Started

                        Number of Bricks: 1 x 3 = 3

                        Transport-type: tcp

                        Bricks:

                        Brick1: cluster-rep:/GFS/data/www1

                        Brick2: web-rep:/GFS/data/www1

                        Brick3: mail-rep:/GFS/data/www1

                        Options Reconfigured:

                        performance.readdir-ahead: on

                        cluster.quorum-type: fixed

                        cluster.quorum-count: 2

                        cluster.server-quorum-ratio: 51%

                        Volume Name: system_mail1

                        Type: Replicate

                        Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5

                        Status: Started

                        Number of Bricks: 1 x 3 = 3

                        Transport-type: tcp

                        Bricks:

                        Brick1: cluster-rep:/GFS/system/mail1

                        Brick2: mail-rep:/GFS/system/mail1

                        Brick3: web-rep:/GFS/system/mail1

                        Options Reconfigured:

                        performance.readdir-ahead: on

                        cluster.quorum-type: none

                        cluster.quorum-count: 2

                        cluster.server-quorum-ratio: 51%

                        Volume Name: system_www1

                        Type: Replicate

                        Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124

                        Status: Started

                        Number of Bricks: 1 x 3 = 3

                        Transport-type: tcp

                        Bricks:

                        Brick1: cluster-rep:/GFS/system/www1

                        Brick2: web-rep:/GFS/system/www1

                        Brick3: mail-rep:/GFS/system/www1

                        Options Reconfigured:

                        performance.readdir-ahead: on

                        cluster.quorum-type: none

                        cluster.quorum-count: 2

                        cluster.server-quorum-ratio: 51%

                        The issue does not occur when I get rid of 3rd
                        arbiter brick.

                    What do you mean by 'getting rid of'? Killing the
                    3rd brick process of the volume?

                    Regards,

                    Ravi

                        If there's any additional information that is
                        missing and I could provide, please let me know.

                        Greetings,

                        Adrian

                      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

                    _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users