Re: [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

Ravishankar N <ravishankar@xxxxxxxxxx> · Thu, 29 Sep 2016 17:46:28 +0530



    On 09/29/2016 05:18 PM, Sahina Bose
      wrote:

    
      Yes, this is a GlusterFS problem. Adding gluster
        users ML

      
        On Thu, Sep 29, 2016 at 5:11 PM, Davide
          Ferrari <davide@xxxxxxxxxxxx>
          wrote:

          
                        Hello

                          
                        maybe this is more glustefs then ovirt related
                        but since OVirt integrates Gluster management
                        and I'm experiencing the problem in an ovirt
                        cluster, I'm writing here.

                        
                      The problem is simple: I have a data domain
                      mappend on a replica 3 arbiter1 Gluster volume
                      with 6 bricks, like this:

                      
                        Status of volume: data_ssd

                        Gluster process                            
                        TCP Port  RDMA Port  Online  Pid

                        ------------------------------------------------------------------------------

                        Brick vm01.storage.billy:/gluster/ssd/data/

                        brick                                      
                        49153     0          Y       19298

                        Brick vm02.storage.billy:/gluster/ssd/data/

                        brick                                      
                        49153     0          Y       6146 

                        Brick vm03.storage.billy:/gluster/ssd/data/

                        arbiter_brick                              
                        49153     0          Y       6552 

                        Brick vm03.storage.billy:/gluster/ssd/data/

                        brick                                      
                        49154     0          Y       6559 

                        Brick vm04.storage.billy:/gluster/ssd/data/

                        brick                                      
                        49152     0          Y       6077 

                        Brick vm02.storage.billy:/gluster/ssd/data/

                        arbiter_brick                              
                        49154     0          Y       6153 

                        Self-heal Daemon on localhost              
                        N/A       N/A        Y       30746

                        Self-heal Daemon on vm01.storage.billy     
                        N/A       N/A        Y       196058

                        Self-heal Daemon on vm03.storage.billy     
                        N/A       N/A        Y       23205

                        Self-heal Daemon on vm04.storage.billy     
                        N/A       N/A        Y       8246 

                      
                    Now, I've put in maintenance the vm04 host, from
                    ovirt, ticking the "Stop gluster" checkbox, and
                    Ovirt didn't complain about anything. But when I
                    tried to run a new VM it complained about "storage
                    I/O problem", while the storage data status was
                    always UP.

                    
                  Looking in the gluster logs I can see this:

                  
                  [2016-09-29
                    11:01:01.556908] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
                    0-glusterfs: No change in volfile, continuing

                    [2016-09-29 11:02:28.124151] E [MSGID: 108008]
                    [afr-read-txn.c:89:afr_read_txn_refresh_done]
                    0-data_ssd-replicate-1: Failing READ on gfid
                    bf5922b7-19f3-4ce3-98df-71e981ecca8d:
                    split-brain observed. [Input/output error]

                    [2016-09-29 11:02:28.126580] W [MSGID: 108008]
                    [afr-read-txn.c:244:afr_read_txn]
                    0-data_ssd-replicate-1: Unreadable subvolume -1
                    found with event generation 6 for gfid
                    bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible
                    split-brain)

                    [2016-09-29 11:02:28.127374] E [MSGID: 108008]
                    [afr-read-txn.c:89:afr_read_txn_refresh_done]
                    0-data_ssd-replicate-1: Failing FGETXATTR on gfid
                    bf5922b7-19f3-4ce3-98df-71e981ecca8d:
                    split-brain observed. [Input/output error]

                    [2016-09-29 11:02:28.128130] W [MSGID: 108027]
                    [afr-common.c:2403:afr_discover_done]
                    0-data_ssd-replicate-1: no read subvols for (null)

                    [2016-09-29 11:02:28.129890] W
                    [fuse-bridge.c:2228:fuse_readv_cbk]
                    0-glusterfs-fuse: 8201: READ => -1
                    gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d
                    fd=0x7f09b749d210 (Input/output error)

                    [2016-09-29 11:02:28.130824] E [MSGID: 108008]
                    [afr-read-txn.c:89:afr_read_txn_refresh_done]
                    0-data_ssd-replicate-1: Failing FSTAT on gfid
                    bf5922b7-19f3-4ce3-98df-71e981ecca8d:
                    split-brain observed. [Input/output error]

                  
    Does `gluster volume heal data_ssd info split-brain` report that the
    file is in split-brain, with vm04 still being down? 

    If yes, could you provide the extended attributes of this gfid from
    all 3 bricks:

    getfattr -d -m . -e hex /path/to/brick/bf/59/bf5922b7-19f3-4ce3-98df-71e981ecca8d

    
    If no, then I'm guessing that it is not in actual split-brain (hence
    the 'Possible split-brain' message). If the node you brought down
    contains the only good copy of the file (i.e the other data brick
    and arbiter are up, and the arbiter 'blames' this other brick), all
    I/O is failed with EIO to prevent file from getting into actual
    split-brain. The heals will happen when the good node comes up and
    I/O should be allowed again in that case.

    
    -Ravi

    
                [2016-09-29
                    11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk]
                    0-glusterfs-fuse: 8202: FSTAT()
                    /ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527
                    => -1 (Input/output error)

                    The message "W [MSGID: 108008]
                    [afr-read-txn.c:244:afr_read_txn]
                    0-data_ssd-replicate-1: Unreadable subvolume -1
                    found with event generation 6 for gfid
                    bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible
                    split-brain)" repeated 11 times between [2016-09-29
                    11:02:28.126580] and [2016-09-29 11:02:28.517744]

                    [2016-09-29 11:02:28.518607] E [MSGID: 108008]
                    [afr-read-txn.c:89:afr_read_txn_refresh_done]
                    0-data_ssd-replicate-1: Failing STAT on gfid
                    bf5922b7-19f3-4ce3-98df-71e981ecca8d:
                    split-brain observed. [Input/output error]

                  
                Now, how is it possible to have a split brain if I
                stopped just ONE server which had just ONE of six
                bricks, and it was cleanly shut down with maintenance
                mode from ovirt?

                
              I created the volume originally this way:

              # gluster
                volume create data_ssd replica 3 arbiter 1
                vm01.storage.billy:/gluster/ssd/data/brick
                vm02.storage.billy:/gluster/ssd/data/brick
                vm03.storage.billy:/gluster/ssd/data/arbiter_brick
                vm03.storage.billy:/gluster/ssd/data/brick
                vm04.storage.billy:/gluster/ssd/data/brick
                vm02.storage.billy:/gluster/ssd/data/arbiter_brick

                # gluster volume set data_ssd group virt

                # gluster volume set data_ssd storage.owner-uid 36
                && gluster volume set data_ssd storage.owner-gid
                36

                # gluster volume start data_ssd

                
                                -- 

                                
                                    Davide Ferrari

                                    
                                    Senior Systems Engineer

                                  
            _______________________________________________

            Users mailing list

            Users@xxxxxxxxx

            http://lists.ovirt.org/mailman/listinfo/users

            
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users