Re: libgfapi failover problem on replica bricks

Roman <romeo.r@xxxxxxxxx> · Tue, 5 Aug 2014 11:36:27 +0300

Well, it seems like it doesn't see the changes were made to the volume ? I created two files 200 and 100 MB (from /dev/zero) after I disconnected the first brick. Then connected it back and got these logs:

[2014-08-05 08:30:37.830150] I [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2014-08-05 08:30:37.830207] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-HA-fast-150G-PVE1-client-0: changing port to 49153 (from 0)
[2014-08-05 08:30:37.830239] W [socket.c:514:__socket_rwv] 0-HA-fast-150G-PVE1-client-0: readv failed (No data available)
[2014-08-05 08:30:37.831024] I [client-handshake.c:1659:select_server_supported_programs] 0-HA-fast-150G-PVE1-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-08-05 08:30:37.831375] I [client-handshake.c:1456:client_setvolume_cbk] 0-HA-fast-150G-PVE1-client-0: Connected to 10.250.0.1:49153, attached to remote volume '/exports/fast-test/150G'.
[2014-08-05 08:30:37.831394] I [client-handshake.c:1468:client_setvolume_cbk] 0-HA-fast-150G-PVE1-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-08-05 08:30:37.831566] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-fast-150G-PVE1-client-0: Server lk version = 1

[2014-08-05 08:30:37.830150] I [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
this line seems weird to me tbh.
I do not see any traffic on switch interfaces between gluster servers, which means, there is no syncing between them.
I tried to ls -l the files on the client and servers to trigger the healing, but seems like no success. Should I wait more?

2014-08-05 11:25 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

    On 08/05/2014 01:10 PM, Roman wrote:

      Ahha! For some reason I was not able to start the
        VM anymore, Proxmox VE told me, that it is not able to read the
        qcow2 header due to permission is denied for some reason. So I
        just deleted that file and created a new VM. And the nex message
        I've got was this:

    Seems like these are the messages where you took down the bricks
    before self-heal. Could you restart the run waiting for self-heals
    to complete before taking down the next brick?

    Pranith

            [2014-08-05 07:31:25.663412] E
              [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
              0-HA-fast-150G-PVE1-replicate-0: Unable to self-heal
              contents of '/images/124/vm-124-disk-1.qcow2' (possible
              split-brain). Please delete the file from all but the
              preferred subvolume.- Pending matrix:  [ [ 0 60 ] [ 11 0 ]
              ]
            [2014-08-05 07:31:25.663955] E
              [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
              0-HA-fast-150G-PVE1-replicate-0: background  data
              self-heal failed on /images/124/vm-124-disk-1.qcow2

        2014-08-05 10:13 GMT+03:00 Pranith
          Kumar Karampuri <pkarampu@xxxxxxxxxx>:

             I just responded to
              your earlier mail about how the log looks. The log comes
              on the mount's logfile

                  Pranith

                  On 08/05/2014 12:41 PM, Roman wrote:

                    Ok, so I've waited enough, I think.
                      Had no any traffic on switch ports between
                      servers. Could not find any suitable log message
                      about completed self-heal (waited about 30
                      minutes). Plugged out the other server's UTP cable
                      this time and got in the same situation:

                        root@gluster-test1:~# cat /var/log/dmesg
                        -bash: /bin/cat: Input/output error

                      brick logs:

                        [2014-08-05 07:09:03.005474] I
                          [server.c:762:server_rpc_notify]
                          0-HA-fast-150G-PVE1-server: disconnecting
                          connectionfrom
                          pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
                        [2014-08-05 07:09:03.005530] I
                          [server-helpers.c:729:server_connection_put]
                          0-HA-fast-150G-PVE1-server: Shutting down
                          connection
                          pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
                        [2014-08-05 07:09:03.005560] I
                          [server-helpers.c:463:do_fd_cleanup]
                          0-HA-fast-150G-PVE1-server: fd cleanup on
                          /images/124/vm-124-disk-1.qcow2
                        [2014-08-05 07:09:03.005797] I
                          [server-helpers.c:617:server_connection_destroy]
                          0-HA-fast-150G-PVE1-server: destroyed
                          connection of
                          pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0

                      2014-08-05 9:53 GMT+03:00
                        Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

                           Do you
                            think it is possible for you to do these
                            tests on the latest version 3.5.2? 'gluster
                            volume heal <volname> info' would give
                            you that information in versions > 3.5.1.

                            Otherwise you will have to check it from
                            either the logs, there will be self-heal
                            completed message on the mount logs (or) by
                            observing 'getfattr -d -m. -e hex
                            <image-file-on-bricks>'

                                Pranith

                                On 08/05/2014 12:09 PM, Roman
                                  wrote:

                                  Ok, I understand. I
                                    will try this shortly.
                                    How can I be sure, that healing
                                      process is done, if I am not able
                                      to see its status?

                                    2014-08-05
                                      9:30 GMT+03:00 Pranith Kumar
                                      Karampuri <pkarampu@xxxxxxxxxx>:

                                         Mounts will
                                          do the healing, not the
                                          self-heal-daemon. The problem
                                          I feel is that whichever
                                          process does the healing has
                                          the latest information about
                                          the good bricks in this
                                          usecase. Since for VM usecase,
                                          mounts should have the latest
                                          information, we should let the
                                          mounts do the healing. If the
                                          mount accesses the VM image
                                          either by someone doing
                                          operations inside the VM or
                                          explicit stat on the file it
                                          should do the healing.

                                              Pranith.

                                              On 08/05/2014 10:39
                                                AM, Roman wrote:

                                                Hmmm, you
                                                  told me to turn it
                                                  off. Did I understood
                                                  something wrong? After
                                                  I issued the command
                                                  you've sent me, I was
                                                  not able to watch the
                                                  healing process, it
                                                  said, it won't be
                                                  healed, becouse its
                                                  turned off.

                                                  2014-08-05

                                                    5:39 GMT+03:00
                                                    Pranith Kumar
                                                    Karampuri <pkarampu@xxxxxxxxxx>:

                                                        You didn't
                                                        mention anything
                                                        about
                                                        self-healing.
                                                        Did you wait
                                                        until the
                                                        self-heal is
                                                        complete?

                                                          Pranith

                                                          On
                                                          08/04/2014
                                                          05:49 PM,
                                                          Roman wrote:

                                                          Hi!
                                                          Result is
                                                          pretty same. I
                                                          set the switch
                                                          port down for
                                                          1st server, it
                                                          was ok. Then
                                                          set it up back
                                                          and set other
                                                          server's port
                                                          off. and it
                                                          triggered IO
                                                          error on two
                                                          virtual
                                                          machines: one
                                                          with local
                                                          root FS but
                                                          network
                                                          mounted
                                                          storage. and
                                                          other with
                                                          network root
                                                          FS. 1st gave
                                                          an error on
                                                          copying to or
                                                          from the
                                                          mounted
                                                          network disk,
                                                          other just
                                                          gave me an
                                                          error for even
                                                          reading
                                                          log.files.

                                                          cat:
                                                          /var/log/alternatives.log:
                                                          Input/output
                                                          error

                                                          then I
                                                          reset the kvm
                                                          VM and it said
                                                          me, there is
                                                          no boot
                                                          device. Next I
                                                          virtually
                                                          powered it off
                                                          and then back
                                                          on and it has
                                                          booted.

                                                          By the
                                                          way, did I
                                                          have to
                                                          start/stop
                                                          volume?

                                                          >> Could you do the
                                                          following and
                                                          test it again?
                                                          >>

                                                          gluster volume
                                                          set
                                                          <volname>
                                                          cluster.self-heal-daemon

                                                          off

                                                          >>Pranith

                                                          2014-08-04

                                                          14:10
                                                          GMT+03:00
                                                          Pranith Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>:

                                                          On
                                                          08/04/2014
                                                          03:33 PM,
                                                          Roman wrote:

                                                          Hello!

                                                          Facing

                                                          the same
                                                          problem as
                                                          mentioned
                                                          here:

                                                          http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html

                                                          my

                                                          set up is up
                                                          and running,
                                                          so i'm ready
                                                          to help you
                                                          back with
                                                          feedback.

                                                          setup:
                                                          proxmox

                                                          server as
                                                          client

                                                          2 gluster
                                                          physical
                                                           servers

                                                          server

                                                          side and
                                                          client side
                                                          both running
                                                          atm 3.4.4
                                                          glusterfs from
                                                          gluster repo.

                                                          the

                                                          problem is:

                                                          1. craeted
                                                          replica
                                                          bricks.
                                                          2.

                                                          mounted in
                                                          proxmox (tried
                                                          both promox
                                                          ways: via GUI
                                                          and fstab
                                                          (with backup
                                                          volume line),
                                                          btw while
                                                          mounting via
                                                          fstab I'm
                                                          unable to
                                                          launch a VM
                                                          without cache,
                                                          meanwhile
                                                          direct-io-mode
                                                          is enabled in
                                                          fstab line)
                                                          3.

                                                          installed VM
                                                          4.

                                                          bring one
                                                          volume down -
                                                          ok

                                                          5. bringing
                                                          up, waiting
                                                          for sync is
                                                          done.
                                                          6.

                                                          bring other
                                                          volume down -
                                                          getting IO
                                                          errors on VM
                                                          guest and not
                                                          able to
                                                          restore the VM
                                                          after I reset
                                                          the VM via
                                                          host. It says
                                                          (no bootable
                                                          media). After
                                                          I shut it down
                                                          (forced) and
                                                          bring back up,
                                                          it boots.

                                                          Could you do
                                                          the following
                                                          and test it
                                                          again?

                                                          gluster volume
                                                          set
                                                          <volname>
                                                          cluster.self-heal-daemon

                                                          off

                                                          Pranith

                                                          Need

                                                          help. Tried
                                                          3.4.3, 3.4.4.
                                                          Still

                                                          missing pkg-s
                                                          for 3.4.5 for
                                                          debian and
                                                          3.5.2 (3.5.1
                                                          always gives a
                                                          healing error
                                                          for some
                                                          reason)

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                  -- 

                                                  Best regards,

                                                  Roman. 

                                    -- 

                                    Best regards,

                                    Roman. 

                      -- 

                      Best regards,

                      Roman. 

        -- 

        Best regards,

        Roman.

-- 
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users