Re: libgfapi failover problem on replica bricks

Roman <romeo.r@xxxxxxxxx> · Wed, 6 Aug 2014 09:57:38 +0300

Yesterday I've reproduced this situation two times. 
The setup:
1. Hardware and network
   a. Disks INTEL SSDSC2BB240G4
   b1. Network cards: X540-AT2

   b2. Netgear 10g switch
2. Software setup:
   a. OS: Debian wheezy
   b. Glusterfs: 3.4.4 (latest 3.4.4 from gluster repository)
   c. Promox VE with update glusterfs from gluster repository
3. Software Configuration
   a. create replicated volume with cluster.self-heal-daemon: off; nfs.disable: off; network.ping-timeout: 2 opts
   b. mount it on proxmox VE (via proxmox gui, it mouts with these opts: stor1:HA-fast-150G-PVE1 on /mnt/pve/FAST-TESt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)   )
   c. install VM with qcow2 or raw disk image.
   d. disable port / remove network cable from one of storage servers
   e. wait and put cable back
   f. keep waiting for sync (pointless, it won't ever start)
   g. disable another port for second server (or remove cable from second server)
   h. profit.

Maybe I could use 3.5.2 from debian sid (testing) repository to test with?

2014-08-06 9:39 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

    Roman,

        The file went into split-brain. I think we should do these tests
    with 3.5.2. Where monitoring the heals is easier. Let me also come
    up with a document about how to do this testing you are trying to
    do.

    Humble/Niels,

        Do we have debs available for 3.5.2? In 3.5.1 there was
    packaging issue where /usr/bin/glfsheal is not packaged along with
    the deb. I think that should be fixed now as well?

    Pranith

    On 08/06/2014 11:52 AM, Roman wrote:

      good morning, 

          root@stor1:~# getfattr -d -m. -e hex
            /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
          getfattr: Removing leading '/' from absolute path names
          # file:
            exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
          trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
          trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
          trusted.gfid=0x23c79523075a4158bea38078da570449

          getfattr: Removing leading '/' from absolute path names
          # file:
            exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
          trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
          trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
          trusted.gfid=0x23c79523075a4158bea38078da570449

        2014-08-06 9:20 GMT+03:00 Pranith Kumar
          Karampuri <pkarampu@xxxxxxxxxx>:

                On 08/06/2014 11:30 AM, Roman wrote:

                  Also, this time files are not the same!

                      root@stor1:~# md5sum
                        /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
                      32411360c53116b96a059f17306caeda
                         /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                      root@stor2:~# md5sum
                        /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
                      65b8a6031bcb6f5fb3a11cb1e8b1c9c9
                         /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

              What is the getfattr output?

                  Pranith

                      2014-08-05 16:33
                        GMT+03:00 Roman <romeo.r@xxxxxxxxx>:

                          Nope, it is not working. But
                            this time it went a bit other way

                              root@gluster-client:~# dmesg
                              Segmentation fault

                            I was not able even to start the VM
                              after I done the tests

                            Could

                                not read qcow2 header: Operation not
                                permitted

                            And it seems, it never starts to sync
                              files after first disconnect. VM survives
                              first disconnect, but not second (I waited
                              around 30 minutes). Also, I've
                              got network.ping-timeout: 2 in volume
                              settings, but logs react on first
                              disconnect around 30 seconds. Second was
                              faster, 2 seconds.

                            Reaction was different also:

                            slower one:

                              [2014-08-05 13:26:19.558435] W
                                [socket.c:514:__socket_rwv] 0-glusterfs:
                                readv failed (Connection timed out)
                              [2014-08-05 13:26:19.558485] W
                                [socket.c:1962:__socket_proto_state_machine]
                                0-glusterfs: reading from socket failed.
                                Error (Connection timed out), peer (10.250.0.1:24007)
                              [2014-08-05 13:26:21.281426] W
                                [socket.c:514:__socket_rwv]
                                0-HA-fast-150G-PVE1-client-0: readv
                                failed (Connection timed out)
                              [2014-08-05 13:26:21.281474] W
                                [socket.c:1962:__socket_proto_state_machine]
                                0-HA-fast-150G-PVE1-client-0: reading
                                from socket failed. Error (Connection
                                timed out), peer (10.250.0.1:49153)
                              [2014-08-05 13:26:21.281507] I
                                [client.c:2098:client_rpc_notify]
                                0-HA-fast-150G-PVE1-client-0:
                                disconnected

                            the fast one:

                              2014-08-05 12:52:44.607389] C
                                [client-handshake.c:127:rpc_client_ping_timer_expired]
                                0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
                                has not responded in the last 2 seconds,
                                disconnecting.
                              [2014-08-05 12:52:44.607491] W
                                [socket.c:514:__socket_rwv]
                                0-HA-fast-150G-PVE1-client-1: readv
                                failed (No data available)
                              [2014-08-05 12:52:44.607585] E
                                [rpc-clnt.c:368:saved_frames_unwind]
                                (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
                                [0x7fcb1b4b0558]
                                (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)

                                [0x7fcb1b4aea63]
                                (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
                                [0x7fcb1b4ae97e])))
                                0-HA-fast-150G-PVE1-client-1: forced
                                unwinding frame type(GlusterFS 3.3)
                                op(LOOKUP(27)) called at 2014-08-05
                                12:52:42.463881 (xid=0x381883x)
                              [2014-08-05 12:52:44.607604] W
                                [client-rpc-fops.c:2624:client3_3_lookup_cbk]
                                0-HA-fast-150G-PVE1-client-1: remote
                                operation failed: Transport endpoint is
                                not connected. Path: /
                                (00000000-0000-0000-0000-000000000001)
                              [2014-08-05 12:52:44.607736] E
                                [rpc-clnt.c:368:saved_frames_unwind]
                                (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
                                [0x7fcb1b4b0558]
                                (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)

                                [0x7fcb1b4aea63]
                                (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
                                [0x7fcb1b4ae97e])))
                                0-HA-fast-150G-PVE1-client-1: forced
                                unwinding frame type(GlusterFS
                                Handshake) op(PING(3)) called at
                                2014-08-05 12:52:42.463891
                                (xid=0x381884x)
                              [2014-08-05 12:52:44.607753] W
                                [client-handshake.c:276:client_ping_cbk]
                                0-HA-fast-150G-PVE1-client-1: timer must
                                have expired
                              [2014-08-05 12:52:44.607776] I
                                [client.c:2098:client_rpc_notify]
                                0-HA-fast-150G-PVE1-client-1:
                                disconnected

                            I've got SSD disks (just for an info).
                            Should I go and give a try for 3.5.2?

                             2014-08-05 13:06
                              GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

                                     reply along
                                      with gluster-users please :-). May
                                      be you are hitting 'reply' instead
                                      of 'reply all'?

                                          Pranith

                                          On 08/05/2014 03:35 PM,
                                            Roman wrote:

                                            To make sure
                                              and clean, I've created
                                              another VM with raw format
                                              and goint to repeat those
                                              steps. So now I've got two
                                              VM-s one with qcow2 format
                                              and other with raw format.
                                              I will send another e-mail
                                              shortly.

                                              2014-08-05

                                                13:01 GMT+03:00 Pranith
                                                Kumar Karampuri <pkarampu@xxxxxxxxxx>:

                                                      On 08/05/2014
                                                        03:07 PM, Roman
                                                        wrote:

                                                        really,
                                                          seems like the
                                                          same file

                                                          stor1:
                                                          a951641c5230472929836f9fcede6b04
 /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                          stor2:
                                                          a951641c5230472929836f9fcede6b04
 /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                          one thing
                                                          I've seen from
                                                          logs, that
                                                          somehow
                                                          proxmox VE is
                                                          connecting
                                                          with wrong
                                                          version to
                                                          servers?
                                                          [2014-08-05

                                                          09:23:45.218550]
                                                          I
                                                          [client-handshake.c:1659:select_server_supported_programs]
                                                          0-HA-fast-150G-PVE1-client-0:

                                                          Using Program
                                                          GlusterFS 3.3,
                                                          Num (1298437),
                                                          Version (330)

                                                    It is the rpc (over
                                                    the network data
                                                    structures) version,
                                                    which is not changed
                                                    at all from 3.3 so
                                                    thats not a problem.
                                                    So what is the
                                                    conclusion? Is your
                                                    test case working
                                                    now or not?

                                                        Pranith

                                                          but if I
                                                          issue:

                                                          root@pve1:~#

                                                          glusterfs -V
                                                          glusterfs
                                                          3.4.4 built on
                                                          Jun 28 2014
                                                          03:44:57

                                                          seems ok.

                                                          server
                                                           use 3.4.4
                                                          meanwhile
                                                          [2014-08-05

                                                          09:23:45.117875]
                                                          I
                                                          [server-handshake.c:567:server_setvolume]
                                                          0-HA-fast-150G-PVE1-server:

                                                          accepted
                                                          client from
                                                          stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
                                                          (version:
                                                          3.4.4)

                                                          [2014-08-05

                                                          09:23:49.103035]
                                                          I
                                                          [server-handshake.c:567:server_setvolume]
                                                          0-HA-fast-150G-PVE1-server:

                                                          accepted
                                                          client from
                                                          stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
                                                          (version:
                                                          3.4.4)

                                                          if this
                                                          could be the
                                                          reason, of
                                                          course.
                                                          I did
                                                          restart the
                                                          Proxmox VE
                                                          yesterday
                                                          (just for an
                                                          information)

                                                          2014-08-05

                                                          12:30
                                                          GMT+03:00
                                                          Pranith Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>:

                                                          On
                                                          08/05/2014
                                                          02:33 PM,
                                                          Roman wrote:

                                                          Waited

                                                          long enough
                                                          for now, still
                                                          different
                                                          sizes and no
                                                          logs about
                                                          healing :(

                                                          stor1 

                                                          # file:
                                                          exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
                                                          trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
                                                          trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
                                                          trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921

                                                          root@stor1:~#

                                                          du -sh
                                                          /exports/fast-test/150G/images/127/
                                                          1.2G  
                                                           /exports/fast-test/150G/images/127/

                                                          stor2

                                                          # file:
                                                          exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
                                                          trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
                                                          trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
                                                          trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921

                                                          root@stor2:~#

                                                          du -sh
                                                          /exports/fast-test/150G/images/127/
                                                          1.4G  
                                                           /exports/fast-test/150G/images/127/

                                                          According to
                                                          the
                                                          changelogs,
                                                          the file
                                                          doesn't need
                                                          any healing.
                                                          Could you stop
                                                          the operations
                                                          on the VMs and
                                                          take md5sum on
                                                          both these
                                                          machines?

                                                          Pranith

                                                          2014-08-05

                                                          11:49
                                                          GMT+03:00
                                                          Pranith Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>:

                                                          On
                                                          08/05/2014
                                                          02:06 PM,
                                                          Roman wrote:

                                                          Well,

                                                          it seems like
                                                          it doesn't see
                                                          the changes
                                                          were made to
                                                          the volume ? I
                                                          created two
                                                          files 200 and
                                                          100 MB (from
                                                          /dev/zero)
                                                          after I
                                                          disconnected
                                                          the first
                                                          brick. Then
                                                          connected it
                                                          back and got
                                                          these logs:

                                                          [2014-08-05

                                                          08:30:37.830150]

                                                          I
                                                          [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
                                                          0-glusterfs:
                                                          No change in
                                                          volfile,
                                                          continuing
                                                          [2014-08-05

                                                          08:30:37.830207]

                                                          I
                                                          [rpc-clnt.c:1676:rpc_clnt_reconfig]
                                                          0-HA-fast-150G-PVE1-client-0:

                                                          changing port
                                                          to 49153 (from
                                                          0)
                                                          [2014-08-05

                                                          08:30:37.830239]

                                                          W
                                                          [socket.c:514:__socket_rwv]
                                                          0-HA-fast-150G-PVE1-client-0:

                                                          readv failed
                                                          (No data
                                                          available)
                                                          [2014-08-05

                                                          08:30:37.831024]

                                                          I
                                                          [client-handshake.c:1659:select_server_supported_programs]
                                                          0-HA-fast-150G-PVE1-client-0:

                                                          Using Program
                                                          GlusterFS 3.3,
                                                          Num (1298437),
                                                          Version (330)
                                                          [2014-08-05

                                                          08:30:37.831375]

                                                          I
                                                          [client-handshake.c:1456:client_setvolume_cbk]
                                                          0-HA-fast-150G-PVE1-client-0:

                                                          Connected to 10.250.0.1:49153,
                                                          attached to
                                                          remote volume
'/exports/fast-test/150G'.
                                                          [2014-08-05

                                                          08:30:37.831394]

                                                          I
                                                          [client-handshake.c:1468:client_setvolume_cbk]
                                                          0-HA-fast-150G-PVE1-client-0:

                                                          Server and
                                                          Client
                                                          lk-version
                                                          numbers are
                                                          not same,
                                                          reopening the
                                                          fds
                                                          [2014-08-05

                                                          08:30:37.831566]

                                                          I
                                                          [client-handshake.c:450:client_set_lk_version_cbk]
                                                          0-HA-fast-150G-PVE1-client-0:

                                                          Server lk
                                                          version = 1

                                                          [2014-08-05

                                                          08:30:37.830150]

                                                          I
                                                          [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
                                                          0-glusterfs:
                                                          No change in
                                                          volfile,
                                                          continuing

                                                          this line
                                                          seems weird to
                                                          me tbh.
                                                          I do not
                                                          see any
                                                          traffic on
                                                          switch
                                                          interfaces
                                                          between
                                                          gluster
                                                          servers, which
                                                          means, there
                                                          is no syncing
                                                          between them.
                                                          I tried
                                                          to ls -l the
                                                          files on the
                                                          client and
                                                          servers to
                                                          trigger the
                                                          healing, but
                                                          seems like no
                                                          success.
                                                          Should I wait
                                                          more?

                                                          Yes, it should
                                                          take around
                                                          10-15 minutes.
                                                          Could you
                                                          provide
                                                          'getfattr -d
                                                          -m. -e hex
                                                          <file-on-brick>'
                                                          on both the
                                                          bricks.

                                                          Pranith

                                                          2014-08-05

                                                          11:25
                                                          GMT+03:00
                                                          Pranith Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>:

                                                          On
                                                          08/05/2014
                                                          01:10 PM,
                                                          Roman wrote:

                                                          Ahha!

                                                          For some
                                                          reason I was
                                                          not able to
                                                          start the VM
                                                          anymore,
                                                          Proxmox VE
                                                          told me, that
                                                          it is not able
                                                          to read the
                                                          qcow2 header
                                                          due to
                                                          permission is
                                                          denied for
                                                          some reason.
                                                          So I just
                                                          deleted that
                                                          file and
                                                          created a new
                                                          VM. And the
                                                          nex message
                                                          I've got was
                                                          this:

                                                          Seems like
                                                          these are the
                                                          messages where
                                                          you took down
                                                          the bricks
                                                          before
                                                          self-heal.
                                                          Could you
                                                          restart the
                                                          run waiting
                                                          for self-heals
                                                          to complete
                                                          before taking
                                                          down the next
                                                          brick?

                                                          Pranith

                                                          [2014-08-05

                                                          07:31:25.663412]

                                                          E
                                                          [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
                                                          0-HA-fast-150G-PVE1-replicate-0:

                                                          Unable to
                                                          self-heal
                                                          contents of
                                                          '/images/124/vm-124-disk-1.qcow2'
                                                          (possible
                                                          split-brain).
                                                          Please delete
                                                          the file from
                                                          all but the
                                                          preferred
                                                          subvolume.-
                                                          Pending
                                                          matrix:  [ [ 0
                                                          60 ] [ 11 0 ]
                                                          ]
                                                          [2014-08-05

                                                          07:31:25.663955]

                                                          E
                                                          [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
                                                          0-HA-fast-150G-PVE1-replicate-0:

                                                          background
                                                           data
                                                          self-heal
                                                          failed on
                                                          /images/124/vm-124-disk-1.qcow2

                                                          2014-08-05

                                                          10:13
                                                          GMT+03:00
                                                          Pranith Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>:

                                                           I just responded to your earlier mail about how the
                                                          log looks. The
                                                          log comes on
                                                          the mount's
                                                          logfile

                                                          Pranith

                                                          On
                                                          08/05/2014
                                                          12:41 PM,
                                                          Roman wrote:

                                                          Ok,

                                                          so I've waited
                                                          enough, I
                                                          think. Had no
                                                          any traffic on
                                                          switch ports
                                                          between
                                                          servers. Could
                                                          not find any
                                                          suitable log
                                                          message about
                                                          completed
                                                          self-heal
                                                          (waited about
                                                          30 minutes).
                                                          Plugged out
                                                          the other
                                                          server's UTP
                                                          cable this
                                                          time and got
                                                          in the same
                                                          situation:

                                                          root@gluster-test1:~#

                                                          cat
                                                          /var/log/dmesg
                                                          -bash:
                                                          /bin/cat:
                                                          Input/output
                                                          error

                                                          brick
                                                          logs:

                                                          [2014-08-05

                                                          07:09:03.005474]

                                                          I
                                                          [server.c:762:server_rpc_notify]
                                                          0-HA-fast-150G-PVE1-server:

                                                          disconnecting
                                                          connectionfrom
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
                                                          [2014-08-05

                                                          07:09:03.005530]

                                                          I
                                                          [server-helpers.c:729:server_connection_put]
                                                          0-HA-fast-150G-PVE1-server:

                                                          Shutting down
                                                          connection
                                                          pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
                                                          [2014-08-05

                                                          07:09:03.005560]

                                                          I
                                                          [server-helpers.c:463:do_fd_cleanup]
                                                          0-HA-fast-150G-PVE1-server:

                                                          fd cleanup on
/images/124/vm-124-disk-1.qcow2
                                                          [2014-08-05

                                                          07:09:03.005797]

                                                          I
                                                          [server-helpers.c:617:server_connection_destroy]
                                                          0-HA-fast-150G-PVE1-server:

                                                          destroyed
                                                          connection of
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0

                                                          2014-08-05

                                                          9:53 GMT+03:00
                                                          Pranith Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>:

                                                           Do you think it is possible for you to do these tests
                                                          on the latest
                                                          version 3.5.2?
                                                          'gluster
                                                          volume heal
                                                          <volname>
                                                          info' would
                                                          give you that
                                                          information in
                                                          versions >
                                                          3.5.1.

                                                          Otherwise you
                                                          will have to
                                                          check it from
                                                          either the
                                                          logs, there
                                                          will be
                                                          self-heal
                                                          completed
                                                          message on the
                                                          mount logs
                                                          (or) by
                                                          observing
                                                          'getfattr -d
                                                          -m. -e hex
                                                          <image-file-on-bricks>'

                                                          Pranith

                                                          On
                                                          08/05/2014
                                                          12:09 PM,
                                                          Roman wrote:

                                                          Ok,

                                                          I understand.
                                                          I will try
                                                          this shortly.
                                                          How can I
                                                          be sure, that
                                                          healing
                                                          process is
                                                          done, if I am
                                                          not able to
                                                          see its
                                                          status?

                                                          2014-08-05

                                                          9:30 GMT+03:00
                                                          Pranith Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>:

                                                           Mounts will do the healing, not the self-heal-daemon.
                                                          The problem I
                                                          feel is that
                                                          whichever
                                                          process does
                                                          the healing
                                                          has the latest
                                                          information
                                                          about the good
                                                          bricks in this
                                                          usecase. Since
                                                          for VM
                                                          usecase,
                                                          mounts should
                                                          have the
                                                          latest
                                                          information,
                                                          we should let
                                                          the mounts do
                                                          the healing.
                                                          If the mount
                                                          accesses the
                                                          VM image
                                                          either by
                                                          someone doing
                                                          operations
                                                          inside the VM
                                                          or explicit
                                                          stat on the
                                                          file it should
                                                          do the
                                                          healing.

                                                          Pranith.

                                                          On
                                                          08/05/2014
                                                          10:39 AM,
                                                          Roman wrote:

                                                          Hmmm,

                                                          you told me to
                                                          turn it off.
                                                          Did I
                                                          understood
                                                          something
                                                          wrong? After I
                                                          issued the
                                                          command you've
                                                          sent me, I was
                                                          not able to
                                                          watch the
                                                          healing
                                                          process, it
                                                          said, it won't
                                                          be healed,
                                                          becouse its
                                                          turned off.

                                                          2014-08-05

                                                          5:39 GMT+03:00
                                                          Pranith Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>:

                                                           You didn't mention anything about self-healing. Did
                                                          you wait until
                                                          the self-heal
                                                          is complete?

                                                          Pranith

                                                          On
                                                          08/04/2014
                                                          05:49 PM,
                                                          Roman wrote:

                                                          Hi!
                                                          Result is
                                                          pretty same. I
                                                          set the switch
                                                          port down for
                                                          1st server, it
                                                          was ok. Then
                                                          set it up back
                                                          and set other
                                                          server's port
                                                          off. and it
                                                          triggered IO
                                                          error on two
                                                          virtual
                                                          machines: one
                                                          with local
                                                          root FS but
                                                          network
                                                          mounted
                                                          storage. and
                                                          other with
                                                          network root
                                                          FS. 1st gave
                                                          an error on
                                                          copying to or
                                                          from the
                                                          mounted
                                                          network disk,
                                                          other just
                                                          gave me an
                                                          error for even
                                                          reading
                                                          log.files.

                                                          cat:
                                                          /var/log/alternatives.log:
                                                          Input/output
                                                          error

                                                          then I
                                                          reset the kvm
                                                          VM and it said
                                                          me, there is
                                                          no boot
                                                          device. Next I
                                                          virtually
                                                          powered it off
                                                          and then back
                                                          on and it has
                                                          booted.

                                                          By the
                                                          way, did I
                                                          have to
                                                          start/stop
                                                          volume?

                                                          >> Could you do the
                                                          following and
                                                          test it again?
                                                          >>

                                                          gluster volume
                                                          set
                                                          <volname>
                                                          cluster.self-heal-daemon

                                                          off

                                                          >>Pranith

                                                          2014-08-04

                                                          14:10
                                                          GMT+03:00
                                                          Pranith Kumar
                                                          Karampuri <pkarampu@xxxxxxxxxx>:

                                                          On
                                                          08/04/2014
                                                          03:33 PM,
                                                          Roman wrote:

                                                          Hello!

                                                          Facing

                                                          the same
                                                          problem as
                                                          mentioned
                                                          here:

                                                          http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html

                                                          my

                                                          set up is up
                                                          and running,
                                                          so i'm ready
                                                          to help you
                                                          back with
                                                          feedback.

                                                          setup:
                                                          proxmox

                                                          server as
                                                          client

                                                          2 gluster
                                                          physical
                                                           servers

                                                          server

                                                          side and
                                                          client side
                                                          both running
                                                          atm 3.4.4
                                                          glusterfs from
                                                          gluster repo.

                                                          the

                                                          problem is:

                                                          1. craeted
                                                          replica
                                                          bricks.
                                                          2.

                                                          mounted in
                                                          proxmox (tried
                                                          both promox
                                                          ways: via GUI
                                                          and fstab
                                                          (with backup
                                                          volume line),
                                                          btw while
                                                          mounting via
                                                          fstab I'm
                                                          unable to
                                                          launch a VM
                                                          without cache,
                                                          meanwhile
                                                          direct-io-mode
                                                          is enabled in
                                                          fstab line)
                                                          3.

                                                          installed VM
                                                          4.

                                                          bring one
                                                          volume down -
                                                          ok

                                                          5. bringing
                                                          up, waiting
                                                          for sync is
                                                          done.
                                                          6.

                                                          bring other
                                                          volume down -
                                                          getting IO
                                                          errors on VM
                                                          guest and not
                                                          able to
                                                          restore the VM
                                                          after I reset
                                                          the VM via
                                                          host. It says
                                                          (no bootable
                                                          media). After
                                                          I shut it down
                                                          (forced) and
                                                          bring back up,
                                                          it boots.

                                                          Could you do
                                                          the following
                                                          and test it
                                                          again?

                                                          gluster volume
                                                          set
                                                          <volname>
                                                          cluster.self-heal-daemon

                                                          off

                                                          Pranith

                                                          Need

                                                          help. Tried
                                                          3.4.3, 3.4.4.
                                                          Still

                                                          missing pkg-s
                                                          for 3.4.5 for
                                                          debian and
                                                          3.5.2 (3.5.1
                                                          always gives a
                                                          healing error
                                                          for some
                                                          reason)

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                                          -- 

                                                          Best regards,

                                                          Roman. 

                                              -- 

                                              Best regards,

                                              Roman. 

                                -- 

                                Best regards,

                                Roman. 

                      -- 

                      Best regards,

                      Roman. 

        -- 

        Best regards,

        Roman.

-- 
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users