Re: Copy operation freezes. Lots of locks in state BLOCKED (3-node setup with 1 arbiter)

Ravishankar N <ravishankar@xxxxxxxxxx> · Fri, 23 Oct 2015 10:10:44 +0530



    On 10/21/2015 05:55 PM, Adrian
      Gruntkowski wrote:

    
      Hello,

        
        I'm trying to track down a problem with my setup (version 3.7.3
        on Debian stable).

        
        I have a couple of volumes setup in 3-node configuration with 1
        brick as an arbiter for each. 

        
        There are 4 volumes set up in cross-over across 3 physical
        servers, like this:

        
                     ------------------------------------->[
        GigabitEthernet switch ]<--------------------------

                     |                                                ^
                                               |

                     |                                                |
                                               |

                     V                                                V
                                               V

        /-------------------------- \                  
        /-------------------------- \            
        /-------------------------- \

        | web-rep                   |                   | cluster-rep  
                    |             | mail-rep                  |

        |                           |                   |              
                    |             |                           |

        | vols:                     |                   | vols:        
                    |             | vols:                     |

        | system_www1               |                   | system_www1  
                    |             | system_www1(arbiter)      |

        | data_www1                 |                   | data_www1    
                    |             | data_www1(arbiter)        |

        | system_mail1(arbiter)     |                   | system_mail1  
                   |             | system_mail1              |

        | data_mail1(arbiter)       |                   | data_mail1    
                   |             | data_mail1                |

        \---------------------------/                  
        \---------------------------/            
        \---------------------------/

        
        Now, after a fresh boot-up, everything seems to be running fine.

        Then I start copying big files (KVM disk images) from local disk
        to gluster mounts.

        In the beginning it seems to be running fine (although iowait
        seems go so high that it clogs up io operations

        at some moments, but that's an issue for later). After some time
        the transfer freezes, then

        after some (long) time, it advances in a short burst to freeze
        again. Another interesting thing is that

        I see constant flow of the network traffic on interfaces
        dedicated to gluster, even when there's a "freeze".

        
        I have done "gluster volume statedump" at that time of transfer
        (file is copied from local disk on cluster-rep

        onto local mount of "system_www1" volume). I've observer a
        following section in the dump for cluster-rep node:

        
        [xlator.features.locks.system_www1-locks.inode]

        path=/images/101/vm-101-disk-1.qcow2

        mandatory=0

        inodelk-count=12

        lock-dump.domain.domain=system_www1-replicate-0:self-heal

        inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
        pid = 18446744073709551610, owner=c811600cd67f0000,
        client=0x7fbe100df280,
        connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
        granted at 2015-10-21 11:36:22

        lock-dump.domain.domain=system_www1-replicate-0

        inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
        start=2195849216, len=131072, pid = 18446744073709551610,
        owner=c811600cd67f0000, client=0x7fbe100df280,
        connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
        granted at 2015-10-21 11:37:45

        inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0,
        start=9223372036854775805, len=1, pid = 18446744073709551610,
        owner=c811600cd67f0000, client=0x7fbe100df280,
        connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
        granted at 2015-10-21 11:36:22

      
    From the statedump, It looks like self-heal daemon had taken locks
    to heal the file due to which the locks attempted by the client
    (mount) are in blocked state.

    In Arbiter volumes the client (mount) takes full locks (start=0,
    len=0) for every write() as opposed to normal replica volumes which
    take range locks (i.e. appropriate start,len values) for that
    write(). This is done to avoid network split-brains.

    So in normal replica volumes, clients can still write to a file
    while heal is going on, as long as the offsets don't overlap. This
    is not the case with arbiter volumes.

    You can look at the client or glustershd logs to see if there are
    messages that indicate healing of a file, something along the lines
    of "Completed data selfheal on xxx"

    
      inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0,
        start=0, len=0, pid = 0, owner=c4fd2d78487f0000,
        client=0x7fbe100e1380,
        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
        blocked at 2015-10-21 11:37:45

        inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0,
        len=0, pid = 0, owner=dc752e78487f0000, client=0x7fbe100e1380,
        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
        blocked at 2015-10-21 11:37:45

        inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0,
        len=0, pid = 0, owner=34832e78487f0000, client=0x7fbe100e1380,
        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
        blocked at 2015-10-21 11:37:45

        inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0,
        len=0, pid = 0, owner=d44d2e78487f0000, client=0x7fbe100e1380,
        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
        blocked at 2015-10-21 11:37:45

        inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0,
        len=0, pid = 0, owner=306f2e78487f0000, client=0x7fbe100e1380,
        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
        blocked at 2015-10-21 11:37:45

        inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0,
        len=0, pid = 0, owner=8c902e78487f0000, client=0x7fbe100e1380,
        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
        blocked at 2015-10-21 11:37:45

        inodelk.inodelk[8](BLOCKED)=type=WRITE, whence=0, start=0,
        len=0, pid = 0, owner=782c2e78487f0000, client=0x7fbe100e1380,
        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
        blocked at 2015-10-21 11:37:45

        inodelk.inodelk[9](BLOCKED)=type=WRITE, whence=0, start=0,
        len=0, pid = 0, owner=1c0b2e78487f0000, client=0x7fbe100e1380,
        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
        blocked at 2015-10-21 11:37:45

        inodelk.inodelk[10](BLOCKED)=type=WRITE, whence=0, start=0,
        len=0, pid = 0, owner=24332e78487f0000, client=0x7fbe100e1380,
        connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
        blocked at 2015-10-21 11:37:45

        
        There seem to be multiple locks in BLOCKED state - which doesn't
        look normal to me. The other 2 nodes have

        only 2 ACTIVE locks at the same time.

        
        Below is "gluster volume info" output.

        
        # gluster volume info

         
        Volume Name: data_mail1

        Type: Replicate

        Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c

        Status: Started

        Number of Bricks: 1 x 3 = 3

        Transport-type: tcp

        Bricks:

        Brick1: cluster-rep:/GFS/data/mail1

        Brick2: mail-rep:/GFS/data/mail1

        Brick3: web-rep:/GFS/data/mail1

        Options Reconfigured:

        performance.readdir-ahead: on

        cluster.quorum-count: 2

        cluster.quorum-type: fixed

        cluster.server-quorum-ratio: 51%

         
        Volume Name: data_www1

        Type: Replicate

        Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026

        Status: Started

        Number of Bricks: 1 x 3 = 3

        Transport-type: tcp

        Bricks:

        Brick1: cluster-rep:/GFS/data/www1

        Brick2: web-rep:/GFS/data/www1

        Brick3: mail-rep:/GFS/data/www1

        Options Reconfigured:

        performance.readdir-ahead: on

        cluster.quorum-type: fixed

        cluster.quorum-count: 2

        cluster.server-quorum-ratio: 51%

         
        Volume Name: system_mail1

        Type: Replicate

        Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5

        Status: Started

        Number of Bricks: 1 x 3 = 3

        Transport-type: tcp

        Bricks:

        Brick1: cluster-rep:/GFS/system/mail1

        Brick2: mail-rep:/GFS/system/mail1

        Brick3: web-rep:/GFS/system/mail1

        Options Reconfigured:

        performance.readdir-ahead: on

        cluster.quorum-type: none

        cluster.quorum-count: 2

        cluster.server-quorum-ratio: 51%

         
        Volume Name: system_www1

        Type: Replicate

        Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124

        Status: Started

        Number of Bricks: 1 x 3 = 3

        Transport-type: tcp

        Bricks:

        Brick1: cluster-rep:/GFS/system/www1

        Brick2: web-rep:/GFS/system/www1

        Brick3: mail-rep:/GFS/system/www1

        Options Reconfigured:

        performance.readdir-ahead: on

        cluster.quorum-type: none

        cluster.quorum-count: 2

        cluster.server-quorum-ratio: 51%

        
        The issue does not occur when I get rid of 3rd arbiter brick.

      
    What do you mean by 'getting rid of'? Killing the 3rd brick process
    of the volume?

    
    Regards,

    Ravi

    
        If there's any additional information that is missing and I
        could provide, please let me know.

        
        Greetings,

        Adrian
      

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users