Re: I/O error on replicated volume

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Thu, 26 Mar 2015 13:20:02 -0700



    Every 3 seconds implies, to me, that it's trying to reconnect to a
    server.

    
    On 03/26/2015 01:12 PM, Jonathan Heese
      wrote:

    
          Joe,
          

          Hmmm.... But every 3 seconds for all eternity? Seems a bit
            much for a "warning", doesn't it?
          

          Did you see my last reply? My nfs-server.vol
            file seems to indicate that RDMA is still in use in some
            capacity... Is this normal? If not, how can I reconcile
            this?
          

          Thanks.
          

          Regards,
          Jon Heese

          
            From:
                gluster-users-bounces@xxxxxxxxxxx
                <gluster-users-bounces@xxxxxxxxxxx> on behalf of
                Joe Julian <joe@xxxxxxxxxxxxxxxx>

                Sent: Thursday, March 26, 2015 4:08 PM

                To: gluster-users@xxxxxxxxxxx

                Subject: Re:  I/O error on
                replicated volume
               
            
            The RDMA warnings are not relevant if you don't use
              RDMA. It's simply pointing out that it tried to register
              and it couldn't, which would be expected if your system
              doesn't support it.

              
              On 03/23/2015 12:29 AM,
                Mohammed Rafi K C wrote:

              
                On 03/23/2015 11:28 AM,
                  Jonathan Heese wrote:

                
                  On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C"
                    <rkavunga@xxxxxxxxxx>
                    wrote:

                    
                      On 03/21/2015 07:49
                        PM, Jonathan Heese wrote:

                      
                          Mohamed,
                          

                          I have completed the steps you suggested
                            (unmount all, stop the volume, set the
                            config.transport to tcp, start the volume,
                            mount, etc.), and the behavior has indeed
                            changed.
                          

                          [root@duke ~]# gluster volume info

                            
                            Volume Name: gluster_disk

                            Type: Replicate

                            Volume ID:
                            2307a5a8-641e-44f4-8eaf-7cc2b704aafd

                            Status: Started

                            Number of Bricks: 1 x 2 = 2

                            Transport-type: tcp

                            Bricks:

                            Brick1: duke-ib:/bricks/brick1

                            Brick2: duchess-ib:/bricks/brick1

                            Options Reconfigured:

                            config.transport: tcp
                          

                            [root@duke ~]# gluster volume status

                            Status of volume: gluster_disk

                            Gluster
                            process                                        
                            Port    Online  Pid

------------------------------------------------------------------------------

                            Brick
                            duke-ib:/bricks/brick1                           
                            49152   Y       16362

                            Brick
                            duchess-ib:/bricks/brick1                        
                            49152   Y       14155

                            NFS Server on
                            localhost                                
                            2049    Y       16374

                            Self-heal Daemon on
                            localhost                           N/A    
                            Y       16381

                            NFS Server on
                            duchess-ib                               
                            2049    Y       14167

                            Self-heal Daemon on
                            duchess-ib                          N/A    
                            Y       14174

                            
                            Task Status of Volume gluster_disk

------------------------------------------------------------------------------

                            There are no active volume tasks

                            
                          I am no longer seeing the I/O errors during
                            prolonged periods of write I/O that I was
                            seeing when the transport was set to rdma.
                            However, I am seeing this message on both
                            nodes every 3 seconds (almost exactly):
                          

                          ==> /var/log/glusterfs/nfs.log <==

                            [2015-03-21 14:17:40.379719] W
                            [rdma.c:1076:gf_rdma_cm_event_handler]
                            0-gluster_disk-client-1: cma event
                            RDMA_CM_EVENT_REJECTED, error 8
                            (me:10.10.10.1:1023 peer:10.10.10.2:49152)

                          
                          Is this something to worry about? 
                        
                      
                      If you are not using nfs to export the volumes,
                      there is nothing to worry. 

                    
                  I'm using the native glusterfs FUSE component to mount
                  the volume locally on both servers -- I assume that
                  you're referring to the standard NFS protocol stuff,
                  which I'm not using here.
                  

                  Incidentally, I would like to keep my logs from
                    filling up with junk if possible.  Is there
                    something I can do to get rid of these (useless?)
                    error messages?

                  
                If i understand correctly, you are getting this enormous
                log message from nfs log only, all other logs and
                everything are fine now, right ? If that is the case,
                and you are not at all using nfs for exporting the
                volume, as  a workaround you can disable nfs for your
                volume or cluster. (gluster v set nfs.disable on). This
                will turnoff your gluster nfs server, and you will no
                longer get those log messages.

                
                              Any idea why there are rdma pieces in
                                play when I've set my transport to tcp?
                            
                          
                          there should not be any piece of rdma,if
                          possible, can you paste the volfile for nfs
                          server. You can find the volfile in
                          /var/lib/glusterd/nfs/nfs-server.vol or
                          /usr/local/var/lib/glusterd/nfs/nfs-server.vol

                        
                      I will get this for you when I can.  Thanks.
                    
                  
                If you can make it, that will be great help to
                understand the problem.

                
                Rafi KC

                
                      Regards,
                      Jon Heese
                      

                        Rafi KC

                          
                              The actual I/O appears to be handled
                                properly and I've seen no further errors
                                in the testing I've done so far.
                              

                              Thanks.

                              
                              Regards,
                              Jon Heese
                              

                                From:
                                    
                                      gluster-users-bounces@xxxxxxxxxxx
                                    
<gluster-users-bounces@xxxxxxxxxxx> on behalf of Jonathan
                                    Heese 
                                      <jheese@xxxxxxxxx>

                                    Sent: Friday, March 20, 2015
                                    7:04 AM

                                    To: Mohammed Rafi K C

                                    Cc: gluster-users

                                    Subject: Re:
                                    I/O error on replicated volume
                                   
                                
                                  Mohammed,
                                  

                                  Thanks very much for the reply.
                                     I will try that and report back.

                                    
                                    Regards,
                                    Jon Heese
                                  
                                  
                                    On Mar 20, 2015, at 3:26 AM,
                                    "Mohammed Rafi K C" <rkavunga@xxxxxxxxxx>
                                    wrote:

                                    
                                      On
                                        03/19/2015 10:16 PM, Jonathan
                                        Heese wrote:

                                      
                                          Hello
                                                all,
                                           
                                          Does
                                              anyone else have any
                                              further suggestions for
                                              troubleshooting this?
                                           
                                          To
                                              sum up: I have a 2 node 2
                                              brick replicated volume,
                                              which holds a handful of
                                              iSCSI image files which
                                              are mounted and served up
                                              by tgtd (CentOS 6) to a
                                              handful of devices on a
                                              dedicated iSCSI network. 
                                              The most important iSCSI
                                              clients (initiators) are
                                              four VMware ESXi 5.5 hosts
                                              that use the iSCSI volumes
                                              as backing for their
                                              datastores for virtual
                                              machine storage.
                                           
                                          After
                                              a few minutes of sustained
                                              writing to the volume, I
                                              am seeing a massive flood
                                              (over 1500 per second at
                                              times) of this error in
                                              /var/log/glusterfs/mnt-gluster-disk.log:
                                          [2015-03-16
                                              02:24:07.582801] W
                                              [fuse-bridge.c:2242:fuse_writev_cbk]
                                              0-glusterfs-fuse: 635358:
                                              WRITE => -1
                                              (Input/output error)
                                           
                                          When
                                              this happens, the ESXi box
                                              fails its write operation
                                              and returns an error to
                                              the effect of “Unable to
                                              write data to datastore”. 
                                              I don’t see anything else
                                              in the supporting logs to
                                              explain the root cause of
                                              the i/o errors.
                                           
                                          Any
                                              and all suggestions are
                                              appreciated.  Thanks.
                                           
                                        
                                      From the mount logs, i assume that
                                      your volume transport type is
                                      rdma. There are some known issues
                                      for rdma in 3.5.3, and the patch
                                      for to address those issues are
                                      already send to upstream [1]. From
                                      the logs, I'm not sure and it is
                                      hard to tell you whether this
                                      problem is something related to
                                      rdma transport or not. To make
                                      sure that the tcp transport is
                                      works well in this scenario, if
                                      possible can you try to reproduce
                                      the same using tcp type volumes.
                                      You can change the transport type
                                      of volume by doing the following
                                      step ( not recommended in normal
                                      use case).

                                      
                                      1) unmount every client

                                      2) stop the volume

                                      3) run gluster volume set volname
                                      config.transport tcp

                                      4) start the volume again

                                      5) mount the clients

                                      
                                      [1] : http://goo.gl/2PTL61

                                      
                                      Regards

                                      Rafi KC

                                      
                                            Jon
                                                  Heese

                                              Systems
                                                  Engineer

                                              INetU
                                                  Managed Hosting

                                              P:
                                                610.266.7441 x 261

                                              F:
                                                610.266.7434

                                              www.inetu.net
                                            ** This
                                                  message contains
                                                  confidential
                                                  information, which
                                                  also may be
                                                  privileged, and is
                                                  intended only for the
                                                  person(s) addressed
                                                  above. Any
                                                  unauthorized use,
                                                  distribution, copying
                                                  or disclosure of
                                                  confidential and/or
                                                  privileged information
                                                  is strictly
                                                  prohibited. If you
                                                  have received this
                                                  communication in
                                                  error, please erase
                                                  all copies of the
                                                  message and its
                                                  attachments and notify
                                                  the sender immediately
                                                  via reply e-mail. **
                                          
                                           
                                              From:
                                                  Jonathan Heese
                                                  

                                                  Sent: Tuesday,
                                                  March 17, 2015 12:36
                                                  PM

                                                  To:
                                                  'Ravishankar N'; 
gluster-users@xxxxxxxxxxx

                                                  Subject: RE:
                                                   I/O
                                                  error on replicated
                                                  volume
                                            
                                          
                                          Ravi,
                                           
                                          The
                                              last lines in the mount
                                              log before the massive
                                              vomit of I/O errors are
                                              from 22 minutes prior, and
                                              seem innocuous to me:
                                           
                                          [2015-03-16
                                              01:37:07.126340] E
                                              [client-handshake.c:1760:client_query_portmap_cbk]
                                              0-gluster_disk-client-0:
                                              failed to get the port
                                              number for remote
                                              subvolume. Please run
                                              'gluster volume status' on
                                              server to see if brick
                                              process is running.
                                          [2015-03-16
                                              01:37:07.126587] W
                                              [rdma.c:4273:gf_rdma_disconnect]
                                              (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
                                              [0x7fd9c557bccf]
                                              (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
                                              [0x7fd9c557a995]
                                              (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)

                                              [0x7fd9c0d8fb9a])))
                                              0-gluster_disk-client-0:
                                              disconnect called
                                              (peer:10.10.10.1:24008)
                                          [2015-03-16
                                              01:37:07.126687] E
                                              [client-handshake.c:1760:client_query_portmap_cbk]
                                              0-gluster_disk-client-1:
                                              failed to get the port
                                              number for remote
                                              subvolume. Please run
                                              'gluster volume status' on
                                              server to see if brick
                                              process is running.
                                          [2015-03-16
                                              01:37:07.126737] W
                                              [rdma.c:4273:gf_rdma_disconnect]
                                              (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
                                              [0x7fd9c557bccf]
                                              (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
                                              [0x7fd9c557a995]
                                              (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)

                                              [0x7fd9c0d8fb9a])))
                                              0-gluster_disk-client-1:
                                              disconnect called
                                              (peer:10.10.10.2:24008)
                                          [2015-03-16
                                              01:37:10.730165] I
                                              [rpc-clnt.c:1729:rpc_clnt_reconfig]
                                              0-gluster_disk-client-0:
                                              changing port to 49152
                                              (from 0)
                                          [2015-03-16
                                              01:37:10.730276] W
                                              [rdma.c:4273:gf_rdma_disconnect]
                                              (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
                                              [0x7fd9c557bccf]
                                              (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
                                              [0x7fd9c557a995]
                                              (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)

                                              [0x7fd9c0d8fb9a])))
                                              0-gluster_disk-client-0:
                                              disconnect called
                                              (peer:10.10.10.1:24008)
                                          [2015-03-16
                                              01:37:10.739500] I
                                              [rpc-clnt.c:1729:rpc_clnt_reconfig]
                                              0-gluster_disk-client-1:
                                              changing port to 49152
                                              (from 0)
                                          [2015-03-16
                                              01:37:10.739560] W
                                              [rdma.c:4273:gf_rdma_disconnect]
                                              (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
                                              [0x7fd9c557bccf]
                                              (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
                                              [0x7fd9c557a995]
                                              (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)

                                              [0x7fd9c0d8fb9a])))
                                              0-gluster_disk-client-1:
                                              disconnect called
                                              (peer:10.10.10.2:24008)
                                          [2015-03-16
                                              01:37:10.741883] I
                                              [client-handshake.c:1677:select_server_supported_programs]
                                              0-gluster_disk-client-0:
                                              Using Program GlusterFS
                                              3.3, Num (1298437),
                                              Version (330)
                                          [2015-03-16
                                              01:37:10.744524] I
                                              [client-handshake.c:1462:client_setvolume_cbk]
                                              0-gluster_disk-client-0:
                                              Connected to
                                              10.10.10.1:49152, attached
                                              to remote volume
                                              '/bricks/brick1'.
                                          [2015-03-16
                                              01:37:10.744537] I
                                              [client-handshake.c:1474:client_setvolume_cbk]
                                              0-gluster_disk-client-0:
                                              Server and Client
                                              lk-version numbers are not
                                              same, reopening the fds
                                          [2015-03-16
                                              01:37:10.744566] I
                                              [afr-common.c:4267:afr_notify]
                                              0-gluster_disk-replicate-0:
                                              Subvolume
                                              'gluster_disk-client-0'
                                              came back up; going
                                              online.
                                          [2015-03-16
                                              01:37:10.744627] I
                                              [client-handshake.c:450:client_set_lk_version_cbk]
                                              0-gluster_disk-client-0:
                                              Server lk version = 1
                                          [2015-03-16
                                              01:37:10.753037] I
                                              [client-handshake.c:1677:select_server_supported_programs]
                                              0-gluster_disk-client-1:
                                              Using Program GlusterFS
                                              3.3, Num (1298437),
                                              Version (330)
                                          [2015-03-16
                                              01:37:10.755657] I
                                              [client-handshake.c:1462:client_setvolume_cbk]
                                              0-gluster_disk-client-1:
                                              Connected to
                                              10.10.10.2:49152, attached
                                              to remote volume
                                              '/bricks/brick1'.
                                          [2015-03-16
                                              01:37:10.755676] I
                                              [client-handshake.c:1474:client_setvolume_cbk]
                                              0-gluster_disk-client-1:
                                              Server and Client
                                              lk-version numbers are not
                                              same, reopening the fds
                                          [2015-03-16
                                              01:37:10.761945] I
                                              [fuse-bridge.c:5016:fuse_graph_setup]
                                              0-fuse: switched to graph
                                              0
                                          [2015-03-16
                                              01:37:10.762144] I
                                              [client-handshake.c:450:client_set_lk_version_cbk]
                                              0-gluster_disk-client-1:
                                              Server lk version = 1
                                          [2015-03-16
                                                01:37:10.762279] I
                                              [fuse-bridge.c:3953:fuse_init]
                                              0-glusterfs-fuse: FUSE
                                              inited with protocol
                                              versions: glusterfs 7.22
                                              kernel 7.14
                                          [2015-03-16
                                                01:59:26.098670] W
                                              [fuse-bridge.c:2242:fuse_writev_cbk]
                                              0-glusterfs-fuse: 292084:
                                              WRITE => -1
                                              (Input/output error)
                                          …
                                           
                                          I’ve
                                              seen no indication of
                                              split-brain on any files
                                              at any point in this (ever
                                              since downdating from
                                              3.6.2 to 3.5.3, which is
                                              when this particular issue
                                              started):
                                          [root@duke
                                              gfapi-module-for-linux-target-driver-]#
                                              gluster v heal
                                              gluster_disk info
                                          Brick
duke.jonheese.local:/bricks/brick1/
                                          Number
                                              of entries: 0
                                           
                                          Brick
duchess.jonheese.local:/bricks/brick1/
                                          Number
                                              of entries: 0
                                           
                                          Thanks.
                                           
                                          
                                            Jon
                                                  Heese

                                              Systems
                                                  Engineer

                                              INetU
                                                  Managed Hosting

                                              P:
                                                610.266.7441 x 261

                                              F:
                                                610.266.7434

                                              www.inetu.net
                                            ** This
                                                  message contains
                                                  confidential
                                                  information, which
                                                  also may be
                                                  privileged, and is
                                                  intended only for the
                                                  person(s) addressed
                                                  above. Any
                                                  unauthorized use,
                                                  distribution, copying
                                                  or disclosure of
                                                  confidential and/or
                                                  privileged information
                                                  is strictly
                                                  prohibited. If you
                                                  have received this
                                                  communication in
                                                  error, please erase
                                                  all copies of the
                                                  message and its
                                                  attachments and notify
                                                  the sender immediately
                                                  via reply e-mail. **
                                          
                                           
                                              From:
                                                  Ravishankar N [mailto:ravishankar@xxxxxxxxxx]
                                                  

                                                  Sent: Tuesday,
                                                  March 17, 2015 12:35
                                                  AM

                                                  To: Jonathan
                                                  Heese; gluster-users@xxxxxxxxxxx

                                                  Subject: Re:
                                                   I/O
                                                  error on replicated
                                                  volume
                                            
                                          
                                            On
                                              03/17/2015 02:14 AM,
                                              Jonathan Heese wrote:
                                          
                                          
                                                Hello,

                                                    
                                                    So I resolved my
                                                    previous issue with
                                                    split-brains and the
                                                    lack of self-healing
                                                    by dropping my
                                                    installed glusterfs*
                                                    packages from 3.6.2
                                                    to 3.5.3, but now
                                                    I've picked up a new
                                                    issue, which
                                                    actually makes
                                                    normal use of the
                                                    volume practically
                                                    impossible.

                                                    
                                                    A little background
                                                    for those not
                                                    already paying close
                                                    attention:

                                                    I have a 2 node 2
                                                    brick replicating
                                                    volume whose purpose
                                                    in life is to hold
                                                    iSCSI target files,
                                                    primarily for use to
                                                    provide datastores
                                                    to a VMware ESXi
                                                    cluster.  The plan
                                                    is to put a handful
                                                    of image files on
                                                    the Gluster volume,
                                                    mount them locally
                                                    on both Gluster
                                                    nodes, and run tgtd
                                                    on both, pointed to
                                                    the image files on
                                                    the mounted gluster
                                                    volume. Then the
                                                    ESXi boxes will use
                                                    multipath
                                                    (active/passive)
                                                    iSCSI to connect
                                                    to the nodes, with
                                                    automatic failover
                                                    in case of planned
                                                    or unplanned
                                                    downtime of the
                                                    Gluster nodes.

                                                    
                                                    In my most recent
                                                    round of testing
                                                    with 3.5.3, I'm
                                                    seeing a massive
                                                    failure to write
                                                    data to the volume
                                                    after about 5-10
                                                    minutes, so I've
                                                    simplified the
                                                    scenario a bit (to
                                                    minimize the
                                                    variables) to: both
                                                    Gluster nodes up,
                                                    only one node (duke)
                                                    mounted and running
                                                    tgtd, and just
                                                    regular (single
                                                    path) iSCSI from a
                                                    single ESXi server.

                                                    
                                                    About 5-10 minutes
                                                    into migration a VM
                                                    onto the test
                                                    datastore,
                                                    /var/log/messages on
                                                    duke gets blasted
                                                    with a ton of
                                                    messages exactly
                                                    like this:
                                                Mar
                                                  15 22:24:06 duke tgtd:
                                                  bs_rdwr_request(180)
                                                  io error 0x1781e00 2a
                                                  -1 512 22971904,
                                                  Input/output error
                                                 
                                                And
                                                  /var/log/glusterfs/mnt-gluster_disk.log
                                                  gets blased with a ton
                                                  of messages exactly
                                                  like this:
                                                [2015-03-16
                                                  02:24:07.572279] W
                                                  [fuse-bridge.c:2242:fuse_writev_cbk]
                                                  0-glusterfs-fuse:
                                                  635299: WRITE => -1
                                                  (Input/output error)
                                                 
                                              
                                              Are there any messages in
                                              the mount log from AFR
                                              about split-brain just
                                              before the above line
                                              appears?

                                              Does `gluster v heal
                                              <VOLNAME> info` show
                                              any files? Performing I/O
                                              on files that are in
                                              split-brain fail with EIO.

                                              
                                              -Ravi

                                              
                                                And
                                                  the write operation
                                                  from VMware's side
                                                  fails as soon as these
                                                  messages start.
                                                 
                                                I
                                                  don't see any other
                                                  errors (in the log
                                                  files I know of)
                                                  indicating the root
                                                  cause of these i/o
                                                  errors.  I'm sure that
                                                  this is not enough
                                                  information to tell
                                                  what's going on, but
                                                  can anyone help me
                                                  figure out what to
                                                  look at next to figure
                                                  this out?
                                                 
                                                I've
                                                  also considered using
                                                  Dan Lambright's
                                                  libgfapi gluster
                                                  module for tgtd (or
                                                  something similar) to
                                                  avoid going through
                                                  FUSE, but I'm not sure
                                                  whether that would be
                                                  irrelevant to this
                                                  problem, since I'm not
                                                  100% sure if it lies
                                                  in FUSE or elsewhere.
                                                 
                                                Thanks!
                                                 
                                                Jon
                                                      Heese

                                                  Systems
                                                      Engineer

                                                  INetU
                                                      Managed Hosting

                                                  P:
                                                    610.266.7441 x 261

                                                  F:
                                                    610.266.7434

                                                  www.inetu.net
                                                **
                                                      This message
                                                      contains
                                                      confidential
                                                      information, which
                                                      also may be
                                                      privileged, and is
                                                      intended only for
                                                      the person(s)
                                                      addressed above.
                                                      Any unauthorized
                                                      use, distribution,
                                                      copying or
                                                      disclosure of
                                                      confidential
                                                      and/or privileged
                                                      information is
                                                      strictly
                                                      prohibited. If you
                                                      have received this
                                                      communication in
                                                      error, please
                                                      erase all copies
                                                      of the message and
                                                      its attachments
                                                      and notify the
                                                      sender immediately
                                                      via reply e-mail.
                                                      **
                                                 
                                              
                                            _______________________________________________
                                            Gluster-users mailing list
                                            Gluster-users@xxxxxxxxxxx
                                            http://www.gluster.org/mailman/listinfo/gluster-users
                                          
                                           
                                        _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
                                      
                                      
                _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
              
              
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users