Re: I/O error on replicated volume

Mohammed Rafi K C <rkavunga@xxxxxxxxxx> · Mon, 23 Mar 2015 10:50:46 +0530



    On 03/21/2015 07:49 PM, Jonathan Heese
      wrote:

    
        Mohamed,
        

        I have completed the steps you suggested (unmount all, stop
          the volume, set the config.transport to tcp, start the volume,
          mount, etc.), and the behavior has indeed changed.
        

        [root@duke ~]# gluster volume info

          
          Volume Name: gluster_disk

          Type: Replicate

          Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd

          Status: Started

          Number of Bricks: 1 x 2 = 2

          Transport-type: tcp

          Bricks:

          Brick1: duke-ib:/bricks/brick1

          Brick2: duchess-ib:/bricks/brick1

          Options Reconfigured:

          config.transport: tcp
        

          [root@duke ~]# gluster volume status

          Status of volume: gluster_disk

          Gluster process                                        
          Port    Online  Pid

------------------------------------------------------------------------------

          Brick duke-ib:/bricks/brick1                           
          49152   Y       16362

          Brick duchess-ib:/bricks/brick1                        
          49152   Y       14155

          NFS Server on localhost                                
          2049    Y       16374

          Self-heal Daemon on localhost                          
          N/A     Y       16381

          NFS Server on duchess-ib                               
          2049    Y       14167

          Self-heal Daemon on duchess-ib                         
          N/A     Y       14174

          
          Task Status of Volume gluster_disk

------------------------------------------------------------------------------

          There are no active volume tasks

          
        I am no longer seeing the I/O errors during prolonged periods
          of write I/O that I was seeing when the transport was set to
          rdma. However, I am seeing this message on both nodes every 3
          seconds (almost exactly):
        

        ==> /var/log/glusterfs/nfs.log <==

          [2015-03-21 14:17:40.379719] W
          [rdma.c:1076:gf_rdma_cm_event_handler]
          0-gluster_disk-client-1: cma event RDMA_CM_EVENT_REJECTED,
          error 8 (me:10.10.10.1:1023 peer:10.10.10.2:49152)

        
        Is this something to worry about? 
      
    
    If you are not using nfs to export the volumes, there is nothing to
    worry. 

    
        Any idea why there are rdma pieces in play when I've set my
          transport to tcp?
      
    
    there should not be any piece of rdma,if possible, can you paste the
    volfile for nfs server. You can find the volfile in
    /var/lib/glusterd/nfs/nfs-server.vol or
    /usr/local/var/lib/glusterd/nfs/nfs-server.vol.

    
    Rafi KC

    
        The actual I/O appears to be handled properly and I've seen
          no further errors in the testing I've done so far.
        

        Thanks.

        
        Regards,
        Jon Heese
        

          From:
              gluster-users-bounces@xxxxxxxxxxx
              <gluster-users-bounces@xxxxxxxxxxx> on behalf of
              Jonathan Heese <jheese@xxxxxxxxx>

              Sent: Friday, March 20, 2015 7:04 AM

              To: Mohammed Rafi K C

              Cc: gluster-users

              Subject: Re:  I/O error on
              replicated volume
             
          
            Mohammed,
            

            Thanks very much for the reply.  I will try that and
              report back.

              
              Regards,
              Jon Heese
            
            
              On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C" <rkavunga@xxxxxxxxxx>
              wrote:

              
                On 03/19/2015 10:16 PM,
                  Jonathan Heese wrote:

                
                    Hello all,
                     
                    Does
                        anyone else have any further suggestions for
                        troubleshooting this?
                     
                    To
                        sum up: I have a 2 node 2 brick replicated
                        volume, which holds a handful of iSCSI image
                        files which are mounted and served up by tgtd
                        (CentOS 6) to a handful of devices on a
                        dedicated iSCSI network.  The most important
                        iSCSI clients (initiators) are four VMware ESXi
                        5.5 hosts that use the iSCSI volumes as backing
                        for their datastores for virtual machine
                        storage.
                     
                    After
                        a few minutes of sustained writing to the
                        volume, I am seeing a massive flood (over 1500
                        per second at times) of this error in
                        /var/log/glusterfs/mnt-gluster-disk.log:
                    [2015-03-16
                        02:24:07.582801] W
                        [fuse-bridge.c:2242:fuse_writev_cbk]
                        0-glusterfs-fuse: 635358: WRITE => -1
                        (Input/output error)
                     
                    When
                        this happens, the ESXi box fails its write
                        operation and returns an error to the effect of
                        “Unable to write data to datastore”.  I don’t
                        see anything else in the supporting logs to
                        explain the root cause of the i/o errors.
                     
                    Any
                        and all suggestions are appreciated.  Thanks.
                     
                  
                From the mount logs, i assume that your volume transport
                type is rdma. There are some known issues for rdma in
                3.5.3, and the patch for to address those issues are
                already send to upstream [1]. From the logs, I'm not
                sure and it is hard to tell you whether this problem is
                something related to rdma transport or not. To make sure
                that the tcp transport is works well in this scenario,
                if possible can you try to reproduce the same using tcp
                type volumes. You can change the transport type of
                volume by doing the following step ( not recommended in
                normal use case).

                
                1) unmount every client

                2) stop the volume

                3) run gluster volume set volname config.transport tcp

                4) start the volume again

                5) mount the clients

                
                [1] : http://goo.gl/2PTL61

                
                Regards

                Rafi KC

                
                      Jon Heese

                        Systems
                            Engineer

                        INetU
                            Managed Hosting

                        P:
                          610.266.7441 x 261

                        F:
                          610.266.7434

                        www.inetu.net
                      **
                            This message contains confidential
                            information, which also may be privileged,
                            and is intended only for the person(s)
                            addressed above. Any unauthorized use,
                            distribution, copying or disclosure of
                            confidential and/or privileged information
                            is strictly prohibited. If you have received
                            this communication in error, please erase
                            all copies of the message and its
                            attachments and notify the sender
                            immediately via reply e-mail. **
                    
                     
                        From: Jonathan Heese
                            

                            Sent: Tuesday, March 17, 2015 12:36
                            PM

                            To: 'Ravishankar N'; 
                              gluster-users@xxxxxxxxxxx

                            Subject: RE:  I/O
                            error on replicated volume
                      
                    
                    Ravi,
                     
                    The
                        last lines in the mount log before the massive
                        vomit of I/O errors are from 22 minutes prior,
                        and seem innocuous to me:
                     
                    [2015-03-16
                        01:37:07.126340] E
                        [client-handshake.c:1760:client_query_portmap_cbk]
                        0-gluster_disk-client-0: failed to get the port
                        number for remote subvolume. Please run 'gluster
                        volume status' on server to see if brick process
                        is running.
                    [2015-03-16
                        01:37:07.126587] W
                        [rdma.c:4273:gf_rdma_disconnect]
                        (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
                        [0x7fd9c557bccf]
                        (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
                        [0x7fd9c557a995]
                        (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)

                        [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0:
                        disconnect called (peer:10.10.10.1:24008)
                    [2015-03-16
                        01:37:07.126687] E
                        [client-handshake.c:1760:client_query_portmap_cbk]
                        0-gluster_disk-client-1: failed to get the port
                        number for remote subvolume. Please run 'gluster
                        volume status' on server to see if brick process
                        is running.
                    [2015-03-16
                        01:37:07.126737] W
                        [rdma.c:4273:gf_rdma_disconnect]
                        (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
                        [0x7fd9c557bccf]
                        (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
                        [0x7fd9c557a995]
                        (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)

                        [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1:
                        disconnect called (peer:10.10.10.2:24008)
                    [2015-03-16
                        01:37:10.730165] I
                        [rpc-clnt.c:1729:rpc_clnt_reconfig]
                        0-gluster_disk-client-0: changing port to 49152
                        (from 0)
                    [2015-03-16
                        01:37:10.730276] W
                        [rdma.c:4273:gf_rdma_disconnect]
                        (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
                        [0x7fd9c557bccf]
                        (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
                        [0x7fd9c557a995]
                        (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)

                        [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0:
                        disconnect called (peer:10.10.10.1:24008)
                    [2015-03-16
                        01:37:10.739500] I
                        [rpc-clnt.c:1729:rpc_clnt_reconfig]
                        0-gluster_disk-client-1: changing port to 49152
                        (from 0)
                    [2015-03-16
                        01:37:10.739560] W
                        [rdma.c:4273:gf_rdma_disconnect]
                        (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
                        [0x7fd9c557bccf]
                        (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
                        [0x7fd9c557a995]
                        (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)

                        [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1:
                        disconnect called (peer:10.10.10.2:24008)
                    [2015-03-16
                        01:37:10.741883] I
                        [client-handshake.c:1677:select_server_supported_programs]
                        0-gluster_disk-client-0: Using Program GlusterFS
                        3.3, Num (1298437), Version (330)
                    [2015-03-16
                        01:37:10.744524] I
                        [client-handshake.c:1462:client_setvolume_cbk]
                        0-gluster_disk-client-0: Connected to
                        10.10.10.1:49152, attached to remote volume
                        '/bricks/brick1'.
                    [2015-03-16
                        01:37:10.744537] I
                        [client-handshake.c:1474:client_setvolume_cbk]
                        0-gluster_disk-client-0: Server and Client
                        lk-version numbers are not same, reopening the
                        fds
                    [2015-03-16
                        01:37:10.744566] I
                        [afr-common.c:4267:afr_notify]
                        0-gluster_disk-replicate-0: Subvolume
                        'gluster_disk-client-0' came back up; going
                        online.
                    [2015-03-16
                        01:37:10.744627] I
                        [client-handshake.c:450:client_set_lk_version_cbk]
                        0-gluster_disk-client-0: Server lk version = 1
                    [2015-03-16
                        01:37:10.753037] I
                        [client-handshake.c:1677:select_server_supported_programs]
                        0-gluster_disk-client-1: Using Program GlusterFS
                        3.3, Num (1298437), Version (330)
                    [2015-03-16
                        01:37:10.755657] I
                        [client-handshake.c:1462:client_setvolume_cbk]
                        0-gluster_disk-client-1: Connected to
                        10.10.10.2:49152, attached to remote volume
                        '/bricks/brick1'.
                    [2015-03-16
                        01:37:10.755676] I
                        [client-handshake.c:1474:client_setvolume_cbk]
                        0-gluster_disk-client-1: Server and Client
                        lk-version numbers are not same, reopening the
                        fds
                    [2015-03-16
                        01:37:10.761945] I
                        [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse:
                        switched to graph 0
                    [2015-03-16
                        01:37:10.762144] I
                        [client-handshake.c:450:client_set_lk_version_cbk]
                        0-gluster_disk-client-1: Server lk version = 1
                    [2015-03-16
                          01:37:10.762279] I
                        [fuse-bridge.c:3953:fuse_init] 0-glusterfs-fuse:
                        FUSE inited with protocol versions: glusterfs
                        7.22 kernel 7.14
                    [2015-03-16
                          01:59:26.098670] W
                        [fuse-bridge.c:2242:fuse_writev_cbk]
                        0-glusterfs-fuse: 292084: WRITE => -1
                        (Input/output error)
                    …
                     
                    I’ve
                        seen no indication of split-brain on any files
                        at any point in this (ever since downdating from
                        3.6.2 to 3.5.3, which is when this particular
                        issue started):
                    [root@duke
                        gfapi-module-for-linux-target-driver-]# gluster
                        v heal gluster_disk info
                    Brick
                        duke.jonheese.local:/bricks/brick1/
                    Number
                        of entries: 0
                     
                    Brick
                        duchess.jonheese.local:/bricks/brick1/
                    Number
                        of entries: 0
                     
                    Thanks.
                     
                    
                      Jon Heese

                        Systems
                            Engineer

                        INetU
                            Managed Hosting

                        P:
                          610.266.7441 x 261

                        F:
                          610.266.7434

                        www.inetu.net
                      **
                            This message contains confidential
                            information, which also may be privileged,
                            and is intended only for the person(s)
                            addressed above. Any unauthorized use,
                            distribution, copying or disclosure of
                            confidential and/or privileged information
                            is strictly prohibited. If you have received
                            this communication in error, please erase
                            all copies of the message and its
                            attachments and notify the sender
                            immediately via reply e-mail. **
                    
                     
                        From: Ravishankar N [mailto:ravishankar@xxxxxxxxxx]
                            

                            Sent: Tuesday, March 17, 2015 12:35
                            AM

                            To: Jonathan Heese; gluster-users@xxxxxxxxxxx

                            Subject: Re:  I/O
                            error on replicated volume
                      
                    
                      On 03/17/2015 02:14 AM,
                        Jonathan Heese wrote:
                    
                    
                          Hello,

                              
                              So I resolved my previous issue with
                              split-brains and the lack of self-healing
                              by dropping my installed glusterfs*
                              packages from 3.6.2 to 3.5.3, but now I've
                              picked up a new issue, which actually
                              makes normal use of the volume practically
                              impossible.

                              
                              A little background for those not already
                              paying close attention:

                              I have a 2 node 2 brick replicating volume
                              whose purpose in life is to hold iSCSI
                              target files, primarily for use to provide
                              datastores to a VMware ESXi cluster.  The
                              plan is to put a handful of image files on
                              the Gluster volume, mount them locally on
                              both Gluster nodes, and run tgtd on both,
                              pointed to the image files on the mounted
                              gluster volume. Then the ESXi boxes will
                              use multipath (active/passive) iSCSI to
                              connect to the nodes, with automatic
                              failover in case of planned or unplanned
                              downtime of the Gluster nodes.

                              
                              In my most recent round of testing with
                              3.5.3, I'm seeing a massive failure to
                              write data to the volume after about 5-10
                              minutes, so I've simplified the scenario a
                              bit (to minimize the variables) to: both
                              Gluster nodes up, only one node (duke)
                              mounted and running tgtd, and just regular
                              (single path) iSCSI from a single ESXi
                              server.

                              
                              About 5-10 minutes into migration a VM
                              onto the test datastore, /var/log/messages
                              on duke gets blasted with a ton of
                              messages exactly like this:
                          Mar
                            15 22:24:06 duke tgtd: bs_rdwr_request(180)
                            io error 0x1781e00 2a -1 512 22971904,
                            Input/output error
                           
                          And
                            /var/log/glusterfs/mnt-gluster_disk.log gets
                            blased with a ton of messages exactly like
                            this:
                          [2015-03-16
                            02:24:07.572279] W
                            [fuse-bridge.c:2242:fuse_writev_cbk]
                            0-glusterfs-fuse: 635299: WRITE => -1
                            (Input/output error)
                           
                        
                        Are there any messages in the mount log from AFR
                        about split-brain just before the above line
                        appears?

                        Does `gluster v heal <VOLNAME> info` show
                        any files? Performing I/O on files that are in
                        split-brain fail with EIO.

                        
                        -Ravi

                        
                          And
                            the write operation from VMware's side fails
                            as soon as these messages start.
                           
                          I
                            don't see any other errors (in the log files
                            I know of) indicating the root cause of
                            these i/o errors.  I'm sure that this is not
                            enough information to tell what's going on,
                            but can anyone help me figure out what to
                            look at next to figure this out?
                           
                          I've
                            also considered using Dan Lambright's
                            libgfapi gluster module for tgtd (or
                            something similar) to avoid going through
                            FUSE, but I'm not sure whether that would be
                            irrelevant to this problem, since I'm not
                            100% sure if it lies in FUSE or elsewhere.
                           
                          Thanks!
                           
                          Jon Heese

                            Systems
                                Engineer

                            INetU
                                Managed Hosting

                            P:
                              610.266.7441 x 261

                            F:
                              610.266.7434

                            www.inetu.net
                          **
                                This message contains confidential
                                information, which also may be
                                privileged, and is intended only for the
                                person(s) addressed above. Any
                                unauthorized use, distribution, copying
                                or disclosure of confidential and/or
                                privileged information is strictly
                                prohibited. If you have received this
                                communication in error, please erase all
                                copies of the message and its
                                attachments and notify the sender
                                immediately via reply e-mail. **
                           
                        
                      _______________________________________________
                      Gluster-users mailing list
                      Gluster-users@xxxxxxxxxxx
                      http://www.gluster.org/mailman/listinfo/gluster-users
                    
                     
                  _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
                
                
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users