Re: Glusterd seems to be ignoring that the underling filesystem got missing

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Fri, 23 Sep 2016 09:31:19 -0700



    So my question is, how did you get from each of the bricks being
      killed, "Sep 22 19:57:32 glu03
      bricks-vol-video-asset-manager-brick1[45162]: [2016-09-22
      17:57:32.714461] M [MSGID: 113075]
      [posix-helpers.c:1850:posix_health_check_thread_proc]
      0-vol-video-asset-manager-posix: still alive! -> SIGTERM", to
      having them running again?
    Maybe there's a clue in the brick logs, have you looked in those?

    
    On 09/23/2016 09:06 AM, Luca Gervasi
      wrote:

    
      Hi guys,
        I've got
          a strange problem involving this timeline (matches the "Log
          fragment 1" excerpt)
        19:56:50:
          disk is detached from my system. This disk is actually the
          brick of the volume V.
        19:56:50:
          LVM sees the disk as unreachable and starts its maintenance
          procedures
        19:56:50:
          LVM umounts my thin provisioned volumes
        19:57:02:
          Health check on specific bricks fails thus moving the brick to
          a down state

        
        19:57:32:
          XFS filesystem umounts

        
        At this
          point, the brick filesystem is no longer mounted. The
          underlying filesystems is empty (misses the brick directory
          too). My assumption is that gluster would stop itself in such
          conditions: it is not.
        Gluster
          slowly fills my entire root partition, creating its full tree.
        

        My only
          warning point is the disk that starts to fill its inodes to
          100%.
        

        I've
          read release notes for every version subsequent mine (3.7.14,
          3.7.15) without finding relevant fixes and at this point i'm
          pretty sure is some bug undocumented.
        Servers
          were made symmetric.
        

        Could
          you please help me understand how to avoid that gluster
          coninues write on an unmounted filesystem? Thanks.
        

        I'm
          running a 3 node replica on 3 azure vms. This is the
          configuration:
        

        MD (yes,
          i use md to aggregate 4 disks into a single 4Tb volume):
        
          /dev/md128:
                  Version : 1.2
            Creation Time : Mon Aug 29 18:10:45 2016
               Raid Level : raid0
               Array Size : 4290248704 (4091.50 GiB 4393.21 GB)
             Raid Devices : 4
            Total Devices : 4
              Persistence : Superblock is persistent
          

              Update Time : Mon Aug 29 18:10:45 2016
                    State : clean 
           Active Devices : 4
          Working Devices : 4
           Failed Devices : 0
            Spare Devices : 0
          

               Chunk Size : 512K
          

                     Name : 128
                     UUID : d5c51214:43e48da9:49086616:c1371514
                   Events : 0
          

              Number   Major   Minor   RaidDevice State
                 0       8       80        0      active sync  
            /dev/sdf
                 1       8       96        1      active sync  
            /dev/sdg
                 2       8      112        2      active sync  
            /dev/sdh
                 3       8      128        3      active sync  
            /dev/sdi
        
        
        PV, VG,
          LV status
        
            PV         VG      Fmt  Attr PSize PFree DevSize PV
            UUID                               

          
            /dev/md127 VGdata  lvm2 a--  2.00t 2.00t   2.00t
            Kxb6C0-FLIH-4rB1-DKyf-IQuR-bbPE-jm2mu0
            /dev/md128 gluster lvm2 a--  4.00t 1.07t   4.00t
            lDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34m
        
        
           VG      Attr   Ext   #PV #LV #SN VSize VFree VG UUID    
                                       VProfile
            VGdata  wz--n- 4.00m   1   0   0 2.00t 2.00t
            XI2V2X-hdxU-0Jrn-TN7f-GSEk-7aNs-GCdTtn         
            gluster wz--n- 4.00m   1   6   0 4.00t 1.07t
            ztxX4f-vTgN-IKop-XePU-OwqW-T9k6-A6uDk0  
        
        
           LV                  VG      #Seg Attr       LSize   Maj
            Min KMaj KMin Pool     Origin Data%  Meta%  Move Cpy%Sync
            Log Convert LV UUID                                LProfile
            apps-data           gluster    1 Vwi-aotz--  50.00g  -1
             -1  253   12 thinpool        0.08                          
                     znUMbm-ax1N-R7aj-dxLc-gtif-WOvk-9QC8tq         
            feed                gluster    1 Vwi-aotz-- 100.00g  -1
             -1  253   14 thinpool        0.08                          
                     hZ4Isk-dELG-lgFs-2hJ6-aYid-8VKg-3jJko9         
            homes               gluster    1 Vwi-aotz--   1.46t  -1
             -1  253   11 thinpool        58.58                        
                      salIPF-XvsA-kMnm-etjf-Uaqy-2vA9-9WHPkH         
            search-data         gluster    1 Vwi-aotz-- 100.00g  -1
             -1  253   13 thinpool        16.41                        
                      Z5hoa3-yI8D-dk5Q-2jWH-N5R2-ge09-RSjPpQ         
            thinpool            gluster    1 twi-aotz--   2.93t  -1
             -1  253    9                 29.85  60.00                  
                     oHTbgW-tiPh-yDfj-dNOm-vqsF-fBNH-o1izx2         
            video-asset-manager gluster    1 Vwi-aotz-- 100.00g  -1
             -1  253   15 thinpool        0.07                          
                     4dOXga-96Wa-u3mh-HMmE-iX1I-o7ov-dtJ8lZ  
        
        
        Gluster
          volume configuration (all volumes use the same exact
          configuration, listing them all would be redundant)
        
          Volume Name: vol-homes
          Type: Replicate
          Volume ID: 0c8fa62e-dd7e-429c-a19a-479404b5e9c6
          Status: Started
          Number of Bricks: 1 x 3 = 3
          Transport-type: tcp
          Bricks:
          Brick1: glu01.prd.azr:/bricks/vol-homes/brick1
          Brick2: glu02.prd.azr:/bricks/vol-homes/brick1
          Brick3: glu03.prd.azr:/bricks/vol-homes/brick1
          Options Reconfigured:
          performance.readdir-ahead: on
          cluster.server-quorum-type: server
          nfs.disable: disable
          cluster.lookup-unhashed: auto
          performance.nfs.quick-read: on
          performance.nfs.read-ahead: on
          performance.cache-size: 4096MB
          cluster.self-heal-daemon: enable
          diagnostics.brick-log-level: ERROR
          diagnostics.client-log-level: ERROR
          nfs.rpc-auth-unix: off
          nfs.acl: off
          performance.nfs.io-cache: on
          performance.client-io-threads: on
          performance.nfs.stat-prefetch: on
          performance.nfs.io-threads: on
          diagnostics.latency-measurement: on
          diagnostics.count-fop-hits: on
          performance.md-cache-timeout: 1
          performance.cache-refresh-timeout: 1
          performance.io-thread-count: 16
          performance.high-prio-threads: 16
          performance.normal-prio-threads: 16
          performance.low-prio-threads: 16
          performance.least-prio-threads: 1
          cluster.server-quorum-ratio: 60
        
        
        fstab:
        /dev/gluster/homes
                                       /bricks/vol-homes                
            xfs defaults,noatime,nobarrier,nofail 0 2

        
        Software:
        CentOS
          Linux release 7.1.1503 (Core) 
        
          glusterfs-api-3.7.13-1.el7.x86_64
          glusterfs-libs-3.7.13-1.el7.x86_64
          glusterfs-3.7.13-1.el7.x86_64
          glusterfs-fuse-3.7.13-1.el7.x86_64
          glusterfs-server-3.7.13-1.el7.x86_64
          glusterfs-client-xlators-3.7.13-1.el7.x86_64
          glusterfs-cli-3.7.13-1.el7.x86_64
        
        
        Log
          fragment 1:
        
          Sep 22 19:56:50 glu03 lvm[868]: WARNING: Device for PV
            lDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34m not found or rejected
            by a filter.
          Sep 22 19:56:50 glu03 lvm[868]: Cannot change VG gluster
            while PVs are missing.
          Sep 22 19:56:50 glu03 lvm[868]: Consider vgreduce
            --removemissing.
          Sep 22 19:56:50 glu03 lvm[868]: Failed to extend thin
            metadata gluster-thinpool-tpool.
          Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume
            gluster-thinpool-tpool from /bricks/vol-homes.
          Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume
            gluster-thinpool-tpool from /bricks/vol-search-data.
          Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume
            gluster-thinpool-tpool from /bricks/vol-apps-data.
          Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume
            gluster-thinpool-tpool from /bricks/vol-video-asset-manager.
        
        
          Sep 22 19:57:02 glu03
            bricks-vol-video-asset-manager-brick1[45162]: [2016-09-22
            17:57:02.713428] M [MSGID: 113075]
            [posix-helpers.c:1844:posix_health_check_thread_proc]
            0-vol-video-asset-manager-posix: health-check failed, going
            down
          Sep 22 19:57:05 glu03 bricks-vol-apps-data-brick1[44536]:
            [2016-09-22 17:57:05.186146] M [MSGID: 113075]
            [posix-helpers.c:1844:posix_health_check_thread_proc]
            0-vol-apps-data-posix: health-check failed, going down
          Sep 22 19:57:18 glu03
            bricks-vol-search-data-brick1[40928]: [2016-09-22
            17:57:18.674279] M [MSGID: 113075]
            [posix-helpers.c:1844:posix_health_check_thread_proc]
            0-vol-search-data-posix: health-check failed, going down
          Sep 22 19:57:32 glu03
            bricks-vol-video-asset-manager-brick1[45162]: [2016-09-22
            17:57:32.714461] M [MSGID: 113075]
            [posix-helpers.c:1850:posix_health_check_thread_proc]
            0-vol-video-asset-manager-posix: still alive! -> SIGTERM
          Sep 22 19:57:32 glu03 kernel: XFS (dm-15): Unmounting
            Filesystem
          Sep 22 19:57:35 glu03 bricks-vol-apps-data-brick1[44536]:
            [2016-09-22 17:57:35.186352] M [MSGID: 113075]
            [posix-helpers.c:1850:posix_health_check_thread_proc]
            0-vol-apps-data-posix: still alive! -> SIGTERM
          Sep 22 19:57:35 glu03 kernel: XFS (dm-12): Unmounting
            Filesystem
          Sep 22 19:57:48 glu03
            bricks-vol-search-data-brick1[40928]: [2016-09-22
            17:57:48.674444] M [MSGID: 113075]
            [posix-helpers.c:1850:posix_health_check_thread_proc]
            0-vol-search-data-posix: still alive! -> SIGTERM
          Sep 22 19:57:48 glu03 kernel: XFS (dm-13): Unmounting
            Filesystem
        
      
      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users