Re: libgfapi failover problem on replica bricks

Roman <romeo.r@xxxxxxxxx> · Tue, 5 Aug 2014 10:20:57 +0300

Sorry, the "gluster-users" fell out of the receivers list somehow, so I'm replying to it with the full history.I'm watching the mount's logfile with tail -f command and am not able to see such logs... seems like for ever? What is the  optimal time for self-heal to complete? The mount is almost empty, there is a stripped file with VM image only.

The only logs I see are:

2014-08-05 07:12:03.808352] I [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server: accepted client from stor2-31563-2014/08/05-06:10:19:381800-HA-fast-150G-PVE1-client-0-0 (version: 3.4.4)
[2014-08-05 07:12:04.547935] I [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server: accepted client from sisemon-262292-2014/08/04-13:27:19:221777-HA-fast-150G-PVE1-client-0-0 (version: 3.4.4)

[2014-08-05 07:12:06.761596] I [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server: accepted client from pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0 (version: 3.4.4)
[2014-08-05 07:12:09.151322] I [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server: accepted client from pve1-27476-2014/08/04-13:27:19:838805-HA-fast-150G-PVE1-client-0-0 (version: 3.4.4)

2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

    I just responded to your earlier mail about how the log looks. The
    log comes on the mount's logfile

    Pranith

    On 08/05/2014 12:41 PM, Roman wrote:

      Ok, so I've waited enough, I think. Had no any
        traffic on switch ports between servers. Could not find any
        suitable log message about completed self-heal (waited about 30
        minutes). Plugged out the other server's UTP cable this time and
        got in the same situation:

          root@gluster-test1:~# cat /var/log/dmesg
          -bash: /bin/cat: Input/output error

        brick logs:

          [2014-08-05 07:09:03.005474] I
            [server.c:762:server_rpc_notify] 0-HA-fast-150G-PVE1-server:
            disconnecting connectionfrom
            pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
          [2014-08-05 07:09:03.005530] I
            [server-helpers.c:729:server_connection_put]
            0-HA-fast-150G-PVE1-server: Shutting down connection
            pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
          [2014-08-05 07:09:03.005560] I
            [server-helpers.c:463:do_fd_cleanup]
            0-HA-fast-150G-PVE1-server: fd cleanup on
            /images/124/vm-124-disk-1.qcow2
          [2014-08-05 07:09:03.005797] I
            [server-helpers.c:617:server_connection_destroy]
            0-HA-fast-150G-PVE1-server: destroyed connection of
            pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0

        2014-08-05 9:53 GMT+03:00 Pranith Kumar
          Karampuri <pkarampu@xxxxxxxxxx>:

             Do you think it is
              possible for you to do these tests on the latest version
              3.5.2? 'gluster volume heal <volname> info' would
              give you that information in versions > 3.5.1.

              Otherwise you will have to check it from either the logs,
              there will be self-heal completed message on the mount
              logs (or) by observing 'getfattr -d -m. -e hex
              <image-file-on-bricks>'

                  Pranith

                  On 08/05/2014 12:09 PM, Roman wrote:

                    Ok, I understand. I will try this
                      shortly.
                      How can I be sure, that healing process is
                        done, if I am not able to see its status?

                      2014-08-05 9:30 GMT+03:00
                        Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

                           Mounts
                            will do the healing, not the
                            self-heal-daemon. The problem I feel is that
                            whichever process does the healing has the
                            latest information about the good bricks in
                            this usecase. Since for VM usecase, mounts
                            should have the latest information, we
                            should let the mounts do the healing. If the
                            mount accesses the VM image either by
                            someone doing operations inside the VM or
                            explicit stat on the file it should do the
                            healing.

                                Pranith.

                                On 08/05/2014 10:39 AM, Roman
                                  wrote:

                                  Hmmm, you told me to
                                    turn it off. Did I understood
                                    something wrong? After I issued the
                                    command you've sent me, I was not
                                    able to watch the healing process,
                                    it said, it won't be healed, becouse
                                    its turned off.

                                    2014-08-05
                                      5:39 GMT+03:00 Pranith Kumar
                                      Karampuri <pkarampu@xxxxxxxxxx>:

                                         You didn't
                                          mention anything about
                                          self-healing. Did you wait
                                          until the self-heal is
                                          complete?

                                              Pranith

                                              On 08/04/2014 05:49
                                                PM, Roman wrote:

                                                  Hi!
                                                  Result is pretty
                                                    same. I set the
                                                    switch port down for
                                                    1st server, it was
                                                    ok. Then set it up
                                                    back and set other
                                                    server's port off.
                                                    and it triggered IO
                                                    error on two virtual
                                                    machines: one with
                                                    local root FS but
                                                    network mounted
                                                    storage. and other
                                                    with network root
                                                    FS. 1st gave an
                                                    error on copying to
                                                    or from the mounted
                                                    network disk, other
                                                    just gave me an
                                                    error for even
                                                    reading log.files.

                                                    cat:
                                                      /var/log/alternatives.log:
                                                      Input/output error

                                                    then I reset
                                                      the kvm VM and it
                                                      said me, there is
                                                      no boot device.
                                                      Next I virtually
                                                      powered it off and
                                                      then back on and
                                                      it has booted.

                                                    By the way, did
                                                      I have to
                                                      start/stop volume?

                                                  >> Could

                                                      you do the
                                                      following and test
                                                      it again?
                                                  >>

                                                    gluster volume set
                                                    <volname>
                                                    cluster.self-heal-daemon
                                                    off

                                                  >>Pranith

                                                  2014-08-04

                                                    14:10 GMT+03:00
                                                    Pranith Kumar
                                                    Karampuri <pkarampu@xxxxxxxxxx>:

                                                          On
                                                          08/04/2014
                                                          03:33 PM,
                                                          Roman wrote:

                                                          Hello!

                                                          Facing

                                                          the same
                                                          problem as
                                                          mentioned
                                                          here:

                                                          http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html

                                                          my

                                                          set up is up
                                                          and running,
                                                          so i'm ready
                                                          to help you
                                                          back with
                                                          feedback.

                                                          setup:
                                                          proxmox

                                                          server as
                                                          client

                                                          2 gluster
                                                          physical
                                                           servers

                                                          server

                                                          side and
                                                          client side
                                                          both running
                                                          atm 3.4.4
                                                          glusterfs from
                                                          gluster repo.

                                                          the

                                                          problem is:

                                                          1. craeted
                                                          replica
                                                          bricks.
                                                          2.

                                                          mounted in
                                                          proxmox (tried
                                                          both promox
                                                          ways: via GUI
                                                          and fstab
                                                          (with backup
                                                          volume line),
                                                          btw while
                                                          mounting via
                                                          fstab I'm
                                                          unable to
                                                          launch a VM
                                                          without cache,
                                                          meanwhile
                                                          direct-io-mode
                                                          is enabled in
                                                          fstab line)
                                                          3.

                                                          installed VM
                                                          4.

                                                          bring one
                                                          volume down -
                                                          ok

                                                          5. bringing
                                                          up, waiting
                                                          for sync is
                                                          done.
                                                          6.

                                                          bring other
                                                          volume down -
                                                          getting IO
                                                          errors on VM
                                                          guest and not
                                                          able to
                                                          restore the VM
                                                          after I reset
                                                          the VM via
                                                          host. It says
                                                          (no bootable
                                                          media). After
                                                          I shut it down
                                                          (forced) and
                                                          bring back up,
                                                          it boots.

                                                        Could you do the
                                                        following and
                                                        test it again?

                                                        gluster volume
                                                        set
                                                        <volname>
                                                        cluster.self-heal-daemon
                                                        off

                                                        Pranith

                                                          Need

                                                          help. Tried
                                                          3.4.3, 3.4.4.
                                                          Still

                                                          missing pkg-s
                                                          for 3.4.5 for
                                                          debian and
                                                          3.5.2 (3.5.1
                                                          always gives a
                                                          healing error
                                                          for some
                                                          reason)

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

                                                  -- 

                                                  Best regards,

                                                  Roman. 

                                    -- 

                                    Best regards,

                                    Roman. 

                      -- 

                      Best regards,

                      Roman. 

        -- 

        Best regards,

        Roman.

-- 
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users