Re: Copy operation freezes. Lots of locks in state BLOCKED (3-node setup with 1 arbiter)

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Mon, 26 Oct 2015 17:07:57 +0530



    On 10/23/2015 10:10 AM, Ravishankar N
      wrote:

    
      On 10/21/2015 05:55 PM, Adrian
        Gruntkowski wrote:

      
        Hello,

          
          I'm trying to track down a problem with my setup (version
          3.7.3 on Debian stable).

          
          I have a couple of volumes setup in 3-node configuration with
          1 brick as an arbiter for each. 

          
          There are 4 volumes set up in cross-over across 3 physical
          servers, like this:

          
                       ------------------------------------->[
          GigabitEthernet switch ]<--------------------------

                       |                                              
           ^                                        |

                       |                                              
           |                                        |

                       V                                              
           V                                        V

          /-------------------------- \                  
          /-------------------------- \            
          /-------------------------- \

          | web-rep                   |                   | cluster-rep
                        |             | mail-rep                  |

          |                           |                   |            
                        |             |                           |

          | vols:                     |                   | vols:      
                        |             | vols:                     |

          | system_www1               |                   | system_www1
                        |             | system_www1(arbiter)      |

          | data_www1                 |                   | data_www1  
                        |             | data_www1(arbiter)        |

          | system_mail1(arbiter)     |                   | system_mail1
                       |             | system_mail1              |

          | data_mail1(arbiter)       |                   | data_mail1  
                       |             | data_mail1                |

          \---------------------------/                  
          \---------------------------/            
          \---------------------------/

          
          Now, after a fresh boot-up, everything seems to be running
          fine.

          Then I start copying big files (KVM disk images) from local
          disk to gluster mounts.

          In the beginning it seems to be running fine (although iowait
          seems go so high that it clogs up io operations

          at some moments, but that's an issue for later). After some
          time the transfer freezes, then

          after some (long) time, it advances in a short burst to freeze
          again. Another interesting thing is that

          I see constant flow of the network traffic on interfaces
          dedicated to gluster, even when there's a "freeze".

          
          I have done "gluster volume statedump" at that time of
          transfer (file is copied from local disk on cluster-rep

          onto local mount of "system_www1" volume). I've observer a
          following section in the dump for cluster-rep node:

          
          [xlator.features.locks.system_www1-locks.inode]

          path=/images/101/vm-101-disk-1.qcow2

          mandatory=0

          inodelk-count=12

          lock-dump.domain.domain=system_www1-replicate-0:self-heal

          inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
          len=0, pid = 18446744073709551610, owner=c811600cd67f0000,
          client=0x7fbe100df280,
          connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,

          granted at 2015-10-21 11:36:22

          lock-dump.domain.domain=system_www1-replicate-0

          inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
          start=2195849216, len=131072, pid = 18446744073709551610,
          owner=c811600cd67f0000, client=0x7fbe100df280,
          connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,

          granted at 2015-10-21 11:37:45

          inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0,
          start=9223372036854775805, len=1, pid = 18446744073709551610,
          owner=c811600cd67f0000, client=0x7fbe100df280,
          connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,

          granted at 2015-10-21 11:36:22

        
      From the statedump, It looks like self-heal daemon had taken locks
      to heal the file due to which the locks attempted by the client
      (mount) are in blocked state.

      In Arbiter volumes the client (mount) takes full locks (start=0,
      len=0) for every write() as opposed to normal replica volumes
      which take range locks (i.e. appropriate start,len values) for
      that write(). This is done to avoid network split-brains.

      So in normal replica volumes, clients can still write to a file
      while heal is going on, as long as the offsets don't overlap. This
      is not the case with arbiter volumes.

      You can look at the client or glustershd logs to see if there are
      messages that indicate healing of a file, something along the
      lines of "Completed data selfheal on xxx"

    
    hi Adrian,

          Thanks for taking the time to send this mail. I raised this as
    bug @https://bugzilla.redhat.com/show_bug.cgi?id=1275247, fix is
    posted for review @ http://review.gluster.com/#/c/12426/

    
    Pranith

     
        inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0,
          start=0, len=0, pid = 0, owner=c4fd2d78487f0000,
          client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45

          inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=dc752e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45

          inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=34832e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45

          inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=d44d2e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45

          inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=306f2e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45

          inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=8c902e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45

          inodelk.inodelk[8](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=782c2e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45

          inodelk.inodelk[9](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=1c0b2e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45

          inodelk.inodelk[10](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=24332e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45

          
          There seem to be multiple locks in BLOCKED state - which
          doesn't look normal to me. The other 2 nodes have

          only 2 ACTIVE locks at the same time.

          
          Below is "gluster volume info" output.

          
          # gluster volume info

           
          Volume Name: data_mail1

          Type: Replicate

          Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c

          Status: Started

          Number of Bricks: 1 x 3 = 3

          Transport-type: tcp

          Bricks:

          Brick1: cluster-rep:/GFS/data/mail1

          Brick2: mail-rep:/GFS/data/mail1

          Brick3: web-rep:/GFS/data/mail1

          Options Reconfigured:

          performance.readdir-ahead: on

          cluster.quorum-count: 2

          cluster.quorum-type: fixed

          cluster.server-quorum-ratio: 51%

           
          Volume Name: data_www1

          Type: Replicate

          Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026

          Status: Started

          Number of Bricks: 1 x 3 = 3

          Transport-type: tcp

          Bricks:

          Brick1: cluster-rep:/GFS/data/www1

          Brick2: web-rep:/GFS/data/www1

          Brick3: mail-rep:/GFS/data/www1

          Options Reconfigured:

          performance.readdir-ahead: on

          cluster.quorum-type: fixed

          cluster.quorum-count: 2

          cluster.server-quorum-ratio: 51%

           
          Volume Name: system_mail1

          Type: Replicate

          Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5

          Status: Started

          Number of Bricks: 1 x 3 = 3

          Transport-type: tcp

          Bricks:

          Brick1: cluster-rep:/GFS/system/mail1

          Brick2: mail-rep:/GFS/system/mail1

          Brick3: web-rep:/GFS/system/mail1

          Options Reconfigured:

          performance.readdir-ahead: on

          cluster.quorum-type: none

          cluster.quorum-count: 2

          cluster.server-quorum-ratio: 51%

           
          Volume Name: system_www1

          Type: Replicate

          Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124

          Status: Started

          Number of Bricks: 1 x 3 = 3

          Transport-type: tcp

          Bricks:

          Brick1: cluster-rep:/GFS/system/www1

          Brick2: web-rep:/GFS/system/www1

          Brick3: mail-rep:/GFS/system/www1

          Options Reconfigured:

          performance.readdir-ahead: on

          cluster.quorum-type: none

          cluster.quorum-count: 2

          cluster.server-quorum-ratio: 51%

          
          The issue does not occur when I get rid of 3rd arbiter brick.

        
      What do you mean by 'getting rid of'? Killing the 3rd brick
      process of the volume?

      
      Regards,

      Ravi

      
          If there's any additional information that is missing and I
          could provide, please let me know.

          
          Greetings,

          Adrian
        

        _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
      
      
      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users