Re: libgfapi failover problem on replica bricks

Roman <romeo.r@xxxxxxxxx> · Fri, 8 Aug 2014 09:05:46 +0300

Just to be sure: why do you guys create an updated version of glusterfs package for wheezy, if it is not able to install it on wheezy? :)

2014-08-08 9:03 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:

Oh, unfortunately I won't be able to install 3.5.2 nor 3.4.5 :( They both require libc6 update. I would not risk that way.

 glusterfs-common : Depends: libc6 (>= 2.14) but 2.13-38+deb7u3 is to be installed
                    Depends: liblvm2app2.2 (>= 2.02.106) but 2.02.95-8 is to be installed
                    Depends: librdmacm1 (>= 1.0.16) but 1.0.15-1+deb7u1 is to be installed

2014-08-07 15:32 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:

I'm really sorry to bother, but it seems like all my previous test were waste of time with those generated from /dev/zero files :). Its good and bad news. Now I use real files for my tests. As it my almost last workday, only things I prefer to do is to test and document :) .. so here are some new results:

So this time I've got two gluster volumes:

1. with cluster.self-heal-daemon off
2. with cluster.self-heal-daemon on

1. real results with SHD off:

Seems like all is working as expected. VM survives both glusterfs servers outage. And I'm able to see the sync via network traffic. FINE! 

Sometimes healing occurs a bit late (takes time from 1 minute to 1 hour to sync). Don't know why. Ideas?

2. test results on server with SHD on:

VM is not able to survive second server restart (as was previously defined). gives IO errors, Although files are synced. Some locks, that do not allow KVM hypervisor to reconnect to the storage in time?

So the problem actually is stripped files inside a VM :). If one uses them (generates from /dev/zero ie), VM will crash and never come up due to errors in qcow2 file headers. Another bug?

2014-08-07 9:53 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:

Ok, then I hope that we will be able to test it two weeks later. Thanks for your time and  patience. 

2014-08-07 9:49 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

    On 08/07/2014 12:17 PM, Roman wrote:

      Well, one thing is definitely true: If there is no
        healing daemon running, I'm not able to start the VM after
        outage. Seems like the qcow2 file is corrupted (KVM unable to
        read its header).

    We shall see this again once I have the document with all the steps
    that need to be carried out :-)

    Pranith

        2014-08-07 9:35 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:

            > This
                should not happen if you do the writes lets say from
                '/dev/urandom' instead of '/dev/zero'

              Somewhere
                  deep inside me I thought so ! zero is zero :)

              >I
                  will provide you with a document for testing this
                  issue properly. I have a lot going on in my day job so
                  not getting enough time to write that out. Considering
                  the weekend is approaching I will > get a bit of
                  time definitely over the weekend so I will send you
                  the document over the weekend.

              Thank you a lot. I'll
                  wait. Tomorrow starts my vacation and I'll be out for
                  two weeks, so don't hurry very much. 

                2014-08-07 9:26 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

                    On 08/07/2014 11:48 AM, Roman wrote:

                      How can they be in sync, if they
                        are different in size ? And why then VM is not
                        able to survive gluster outage? I really want to
                        use glusterfs in our production for
                        infrastructure virtualization due to its simple
                        setup, but I'm not able to at this moment. Maybe
                        you've got some testing agenda? Or could you
                        list me the steps to make right tests, so our
                        VM-s would survive the outages.

                    This is because of sparse files. http://en.wikipedia.org/wiki/Sparse_file

                    This should not happen if you do the writes lets say
                    from '/dev/urandom' instead of '/dev/zero'

                    I will provide you with a document for testing this
                    issue properly. I have a lot going on in my day job
                    so not getting enough time to write that out.
                    Considering the weekend is approaching I will get a
                    bit of time definitely over the weekend so I will
                    send you the document over the weekend.

                    Pranith

                        We would like to be sure, that in
                          situation, when one of storages is down, the
                          VM-s are running - it is OK, we see this.
                        We would like to be sure, that data after
                          the server is back up is synced - we can't see
                          that atm
                        We would like to be sure, that VMs are
                          failovering to the second storage during the
                          outage - we can't see this atm 
                        :(

                        2014-08-07 9:12
                          GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

                              On 08/07/2014 11:33 AM, Roman wrote:

                                File size increases
                                  because of me :) I generate files on
                                  VM from /dev/zero during the outage of
                                  one server. Then I bring up the downed
                                  server and it seems files never sync.
                                  I'll keep on testing today. Can't read
                                  much from logs also :(. This morning
                                  both VM-s (one on volume with
                                  self-healing and other on volume
                                  without it) survived second server
                                  outage (first server was down
                                  yesterday), while file sizes are
                                  different, VM-s ran without problems.
                                  But I've restarted them before
                                  bringing the second gluster server
                                  down. 

                              Then there is no bug :-). It seems the
                              files are already in sync according to the
                              extended attributes you have pasted. How
                              to do you test if the files are in sync or
                              not?

                              Pranith

                                  So I'm a bit lost at this moment.
                                    I'll try to keep my testings ordered
                                    and write here, what will happen.

                                  2014-08-07
                                    8:29 GMT+03:00 Pranith Kumar
                                    Karampuri <pkarampu@xxxxxxxxxx>:

                                          On 08/07/2014 10:46 AM,
                                            Roman wrote:

                                            yes, they do.

                                                getfattr: Removing
                                                  leading '/' from
                                                  absolute path names
                                                # file:
                                                  exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
                                                trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
                                                trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa

                                                root@stor1:~# du
                                                  -sh
                                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                1.6G  
                                                   /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                root@stor1:~#
                                                  md5sum
                                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                c117d73c9f8a2e09ef13da31f7225fa6

 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                root@stor1:~# du
                                                  -sh
                                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                1.6G  
                                                   /exports/pve1/1T/images/125/vm-125-disk-1.qcow2

                                                root@stor2:~#
                                                  getfattr -d -m. -e hex
/exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                getfattr: Removing
                                                  leading '/' from
                                                  absolute path names
                                                # file:
                                                  exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
                                                trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
                                                trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa

                                                root@stor2:~#
                                                  md5sum
                                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                c117d73c9f8a2e09ef13da31f7225fa6

 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                root@stor2:~# du
                                                  -sh
                                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
                                                2.6G  
                                                   /exports/pve1/1T/images/125/vm-125-disk-1.qcow2

                                        I think the files are differing
                                        in size because of the sparse
                                        file healing issue. Could you
                                        raise a bug with steps to
                                        re-create this issue where after
                                        healing size of the file is
                                        increasing?

                                            Pranith

                                                  2014-08-06 12:49
                                                  GMT+03:00 Humble
                                                  Chirammal <hchiramm@xxxxxxxxxx>:

                                                      ----- Original
                                                      Message -----

                                                      | From: "Pranith
                                                      Kumar Karampuri"
                                                      <pkarampu@xxxxxxxxxx>

                                                      | To: "Roman" <romeo.r@xxxxxxxxx>

                                                      | Cc: gluster-users@xxxxxxxxxxx,
                                                      "Niels de Vos"
                                                      <ndevos@xxxxxxxxxx>,

                                                      "Humble Chirammal"
                                                      <hchiramm@xxxxxxxxxx>

                                                      | Sent: Wednesday,
                                                      August 6, 2014
                                                      12:09:57 PM

                                                      | Subject: Re:

                                                      libgfapi failover
                                                      problem on replica
                                                      bricks

                                                      |

                                                      | Roman,

                                                      |      The file
                                                      went into
                                                      split-brain. I
                                                      think we should do
                                                      these tests

                                                      | with 3.5.2.
                                                      Where monitoring
                                                      the heals is
                                                      easier. Let me
                                                      also come up

                                                      | with a document
                                                      about how to do
                                                      this testing you
                                                      are trying to do.

                                                      |

                                                      | Humble/Niels,

                                                      |      Do we have
                                                      debs available for
                                                      3.5.2? In 3.5.1
                                                      there was
                                                      packaging

                                                      | issue where
                                                      /usr/bin/glfsheal
                                                      is not packaged
                                                      along with the
                                                      deb. I

                                                      | think that
                                                      should be fixed
                                                      now as well?

                                                      |

                                                    Pranith,

                                                    The 3.5.2 packages
                                                    for debian is not
                                                    available yet. We
                                                    are co-ordinating
                                                    internally to get it
                                                    processed.

                                                    I will update the
                                                    list once its
                                                    available.

                                                    --Humble

                                                    |

                                                      | On 08/06/2014
                                                      11:52 AM, Roman
                                                      wrote:

                                                      | > good
                                                      morning,

                                                      | >

                                                      | >
                                                      root@stor1:~#
                                                      getfattr -d -m. -e
                                                      hex

                                                      | >
                                                      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                      | > getfattr:
                                                      Removing leading
                                                      '/' from absolute
                                                      path names

                                                      | > # file:
                                                      exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                      | >
                                                      trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000

                                                      | >
                                                      trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000

                                                      | >
                                                      trusted.gfid=0x23c79523075a4158bea38078da570449

                                                      | >

                                                      | > getfattr:
                                                      Removing leading
                                                      '/' from absolute
                                                      path names

                                                      | > # file:
                                                      exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                      | >
                                                      trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000

                                                      | >
                                                      trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000

                                                      | >
                                                      trusted.gfid=0x23c79523075a4158bea38078da570449

                                                      | >

                                                      | >

                                                      | >

                                                      | > 2014-08-06
                                                      9:20 GMT+03:00
                                                      Pranith Kumar
                                                      Karampuri <pkarampu@xxxxxxxxxx

                                                    | > <mailto:pkarampu@xxxxxxxxxx>>:

                                                    | >

                                                      | >

                                                      | >     On
                                                      08/06/2014 11:30
                                                      AM, Roman wrote:

                                                      | >>    
                                                      Also, this time
                                                      files are not the
                                                      same!

                                                      | >>

                                                      | >>    
                                                      root@stor1:~#
                                                      md5sum

                                                      | >>    
                                                      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                      | >>    
                                                      32411360c53116b96a059f17306caeda

                                                      | >>    
                                                       /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                      | >>

                                                      | >>    
                                                      root@stor2:~#
                                                      md5sum

                                                      | >>    
                                                      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                      | >>    
                                                      65b8a6031bcb6f5fb3a11cb1e8b1c9c9

                                                      | >>    
                                                       /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                      | >     What is
                                                      the getfattr
                                                      output?

                                                      | >

                                                      | >     Pranith

                                                      | >

                                                      | >>

                                                      | >>

                                                      | >>    
                                                      2014-08-05 16:33
                                                      GMT+03:00 Roman
                                                      <romeo.r@xxxxxxxxx

                                                    | >>    
                                                    <mailto:romeo.r@xxxxxxxxx>>:

                                                    | >>

                                                      | >>        
                                                      Nope, it is not
                                                      working. But this
                                                      time it went a bit
                                                      other way

                                                      | >>

                                                      | >>        
                                                      root@gluster-client:~#
                                                      dmesg

                                                      | >>        
                                                      Segmentation fault

                                                      | >>

                                                      | >>

                                                      | >>        
                                                      I was not able
                                                      even to start the
                                                      VM after I done
                                                      the tests

                                                      | >>

                                                      | >>        
                                                      Could not read
                                                      qcow2 header:
                                                      Operation not
                                                      permitted

                                                      | >>

                                                      | >>        
                                                      And it seems, it
                                                      never starts to
                                                      sync files after
                                                      first

                                                      | >>        
                                                      disconnect. VM
                                                      survives first
                                                      disconnect, but
                                                      not second (I

                                                      | >>        
                                                      waited around 30
                                                      minutes). Also,
                                                      I've

                                                      | >>        
                                                      got
                                                      network.ping-timeout:
                                                      2 in volume
                                                      settings, but logs

                                                      | >>        
                                                      react on first
                                                      disconnect around
                                                      30 seconds. Second
                                                      was

                                                      | >>        
                                                      faster, 2 seconds.

                                                      | >>

                                                      | >>        
                                                      Reaction was
                                                      different also:

                                                      | >>

                                                      | >>        
                                                      slower one:

                                                      | >>        
                                                      [2014-08-05
                                                      13:26:19.558435] W
[socket.c:514:__socket_rwv]

                                                      | >>        
                                                      0-glusterfs: readv
                                                      failed (Connection
                                                      timed out)

                                                      | >>        
                                                      [2014-08-05
                                                      13:26:19.558485] W

                                                      | >>        
                                                      [socket.c:1962:__socket_proto_state_machine]

                                                      0-glusterfs:

                                                      | >>        
                                                      reading from
                                                      socket failed.
                                                      Error (Connection
                                                      timed out),

                                                    | >>        
                                                    peer (10.250.0.1:24007 <http://10.250.0.1:24007>)

                                                    | >>    
                                                          [2014-08-05
                                                      13:26:21.281426] W
[socket.c:514:__socket_rwv]

                                                      | >>        
                                                      0-HA-fast-150G-PVE1-client-0:

                                                      readv failed
                                                      (Connection timed
                                                      out)

                                                      | >>        
                                                      [2014-08-05
                                                      13:26:21.281474] W

                                                      | >>        
[socket.c:1962:__socket_proto_state_machine]

                                                      | >>        
                                                      0-HA-fast-150G-PVE1-client-0:

                                                      reading from
                                                      socket failed.

                                                      | >>        
                                                      Error (Connection
                                                      timed out), peer (10.250.0.1:49153

                                                    | >>        
                                                    <http://10.250.0.1:49153>)

                                                    | >>    
                                                          [2014-08-05
                                                      13:26:21.281507] I

                                                      | >>        
[client.c:2098:client_rpc_notify]

                                                      | >>        
                                                      0-HA-fast-150G-PVE1-client-0:

                                                      disconnected

                                                      | >>

                                                      | >>        
                                                      the fast one:

                                                      | >>        
                                                      2014-08-05
                                                      12:52:44.607389] C

                                                      | >>        
[client-handshake.c:127:rpc_client_ping_timer_expired]

                                                      | >>        
                                                      0-HA-fast-150G-PVE1-client-1:

                                                      server 10.250.0.2:49153

                                                    | >>        
                                                    <http://10.250.0.2:49153>

                                                    has not responded in
                                                    the last 2

                                                      | >>  
                                                              seconds,
                                                        disconnecting.

                                                        | >>      
                                                          [2014-08-05
                                                        12:52:44.607491]
                                                        W
                                                        [socket.c:514:__socket_rwv]

                                                        | >>      

                                                        0-HA-fast-150G-PVE1-client-1:
                                                        readv failed (No
                                                        data available)

                                                        | >>      
                                                          [2014-08-05
                                                        12:52:44.607585]
                                                        E

                                                        | >>      

                                                        [rpc-clnt.c:368:saved_frames_unwind]

                                                        | >>      

                                                        (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)

                                                        | >>      

                                                        [0x7fcb1b4b0558]

                                                        | >>      

(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)

                                                        | >>      

                                                        [0x7fcb1b4aea63]

                                                        | >>      

                                                        (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)

                                                        | >>      

                                                        [0x7fcb1b4ae97e])))
                                                        0-HA-fast-150G-PVE1-client-1:

                                                        forced

                                                        | >>      
                                                          unwinding
                                                        frame
                                                        type(GlusterFS
                                                        3.3)
                                                        op(LOOKUP(27))
                                                        called at

                                                        | >>      
                                                          2014-08-05
                                                        12:52:42.463881
                                                        (xid=0x381883x)

                                                        | >>      
                                                          [2014-08-05
                                                        12:52:44.607604]
                                                        W

                                                        | >>      

                                                        [client-rpc-fops.c:2624:client3_3_lookup_cbk]

                                                        | >>      

                                                        0-HA-fast-150G-PVE1-client-1:
                                                        remote operation
                                                        failed:

                                                        | >>      
                                                          Transport
                                                        endpoint is not
                                                        connected. Path:
                                                        /

                                                        | >>      

                                                        (00000000-0000-0000-0000-000000000001)

                                                        | >>      
                                                          [2014-08-05
                                                        12:52:44.607736]
                                                        E

                                                        | >>      

                                                        [rpc-clnt.c:368:saved_frames_unwind]

                                                        | >>      

                                                        (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)

                                                        | >>      

                                                        [0x7fcb1b4b0558]

                                                        | >>      

(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)

                                                        | >>      

                                                        [0x7fcb1b4aea63]

                                                        | >>      

                                                        (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)

                                                        | >>      

                                                        [0x7fcb1b4ae97e])))
                                                        0-HA-fast-150G-PVE1-client-1:

                                                        forced

                                                        | >>      
                                                          unwinding
                                                        frame
                                                        type(GlusterFS
                                                        Handshake)
                                                        op(PING(3))
                                                        called

                                                        | >>      
                                                          at 2014-08-05
                                                        12:52:42.463891
                                                        (xid=0x381884x)

                                                        | >>      
                                                          [2014-08-05
                                                        12:52:44.607753]
                                                        W

                                                        | >>      

                                                        [client-handshake.c:276:client_ping_cbk]

                                                        | >>      

                                                        0-HA-fast-150G-PVE1-client-1:
                                                        timer must have
                                                        expired

                                                        | >>      
                                                          [2014-08-05
                                                        12:52:44.607776]
                                                        I

                                                        | >>      

                                                        [client.c:2098:client_rpc_notify]

                                                        | >>      

                                                        0-HA-fast-150G-PVE1-client-1:
                                                        disconnected

                                                        | >>

                                                        | >>

                                                        | >>

                                                        | >>      
                                                          I've got SSD
                                                        disks (just for
                                                        an info).

                                                        | >>      
                                                          Should I go
                                                        and give a try
                                                        for 3.5.2?

                                                        | >>

                                                        | >>

                                                        | >>

                                                        | >>      
                                                          2014-08-05
                                                        13:06 GMT+03:00
                                                        Pranith Kumar
                                                        Karampuri

                                                    | >>        
                                                    <pkarampu@xxxxxxxxxx
                                                    <mailto:pkarampu@xxxxxxxxxx>>:

                                                    | >>

                                                      | >>        
                                                          reply along
                                                      with gluster-users
                                                      please :-). May be
                                                      you are

                                                      | >>        
                                                          hitting
                                                      'reply' instead of
                                                      'reply all'?

                                                      | >>

                                                      | >>        
                                                          Pranith

                                                      | >>

                                                      | >>        
                                                          On 08/05/2014
                                                      03:35 PM, Roman
                                                      wrote:

                                                      | >>>    
                                                              To make
                                                      sure and clean,
                                                      I've created
                                                      another VM with
                                                      raw

                                                      | >>>    
                                                              format and
                                                      goint to repeat
                                                      those steps. So
                                                      now I've got

                                                      | >>>    
                                                              two VM-s
                                                      one with qcow2
                                                      format and other
                                                      with raw

                                                      | >>>    
                                                              format. I
                                                      will send another
                                                      e-mail shortly.

                                                      | >>>

                                                      | >>>

                                                      | >>>    
                                                              2014-08-05
                                                      13:01 GMT+03:00
                                                      Pranith Kumar
                                                      Karampuri

                                                    | >>>      
                                                          <pkarampu@xxxxxxxxxx
                                                    <mailto:pkarampu@xxxxxxxxxx>>:

                                                      |
                                                        >>>

                                                        | >>>

                                                        | >>>  
                                                                      On
                                                        08/05/2014 03:07
                                                        PM, Roman wrote:

                                                        |
                                                        >>>>

                                                        really, seems
                                                        like the same
                                                        file

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        stor1:

                                                        |
                                                        >>>>

a951641c5230472929836f9fcede6b04

                                                        |
                                                        >>>>

 /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        stor2:

                                                        |
                                                        >>>>

a951641c5230472929836f9fcede6b04

                                                        |
                                                        >>>>

 /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        one thing I've
                                                        seen from logs,
                                                        that somehow
                                                        proxmox

                                                        |
                                                        >>>>

                                                        VE is connecting
                                                        with wrong
                                                        version to
                                                        servers?

                                                        |
                                                        >>>>

                                                        [2014-08-05
                                                        09:23:45.218550]
                                                        I

                                                        |
                                                        >>>>

[client-handshake.c:1659:select_server_supported_programs]

                                                        |
                                                        >>>>

                                                        0-HA-fast-150G-PVE1-client-0:

                                                        Using Program

                                                        |
                                                        >>>>

                                                        GlusterFS 3.3,
                                                        Num (1298437),
                                                        Version (330)

                                                        | >>>  
                                                                      It
                                                        is the rpc (over
                                                        the network data
                                                        structures)

                                                        | >>>  

                                                        version, which
                                                        is not changed
                                                        at all from 3.3
                                                        so

                                                        | >>>  

                                                        thats not a
                                                        problem. So what
                                                        is the
                                                        conclusion? Is

                                                        | >>>  

                                                        your test case
                                                        working now or
                                                        not?

                                                        | >>>

                                                        | >>>  

                                                        Pranith

                                                        | >>>

                                                        |
                                                        >>>>

                                                        but if I issue:

                                                        |
                                                        >>>>

                                                        root@pve1:~#
                                                        glusterfs -V

                                                        |
                                                        >>>>

                                                        glusterfs 3.4.4
                                                        built on Jun 28
                                                        2014 03:44:57

                                                        |
                                                        >>>>

                                                        seems ok.

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        server  use
                                                        3.4.4 meanwhile

                                                        |
                                                        >>>>

                                                        [2014-08-05
                                                        09:23:45.117875]
                                                        I

                                                        |
                                                        >>>>

[server-handshake.c:567:server_setvolume]

                                                        |
                                                        >>>>

                                                        0-HA-fast-150G-PVE1-server:

                                                        accepted client
                                                        from

                                                        |
                                                        >>>>

stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0

                                                        |
                                                        >>>>

                                                        (version: 3.4.4)

                                                        |
                                                        >>>>

                                                        [2014-08-05
                                                        09:23:49.103035]
                                                        I

                                                        |
                                                        >>>>

[server-handshake.c:567:server_setvolume]

                                                        |
                                                        >>>>

                                                        0-HA-fast-150G-PVE1-server:

                                                        accepted client
                                                        from

                                                        |
                                                        >>>>

stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0

                                                        |
                                                        >>>>

                                                        (version: 3.4.4)

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        if this could be
                                                        the reason, of
                                                        course.

                                                        |
                                                        >>>>

                                                        I did restart
                                                        the Proxmox VE
                                                        yesterday (just
                                                        for an

                                                        |
                                                        >>>>

                                                        information)

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        2014-08-05 12:30
                                                        GMT+03:00
                                                        Pranith Kumar
                                                        Karampuri

                                                    | >>>>  
                                                                  <pkarampu@xxxxxxxxxx
                                                    <mailto:pkarampu@xxxxxxxxxx>>:

                                                      |
                                                        >>>>

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                            On
                                                        08/05/2014 02:33
                                                        PM, Roman wrote:

                                                        |
                                                        >>>>>

                                                            Waited long
                                                        enough for now,
                                                        still different

                                                        |
                                                        >>>>>

                                                            sizes and no
                                                        logs about
                                                        healing :(

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                            stor1

                                                        |
                                                        >>>>>

                                                            # file:

                                                        |
                                                        >>>>>

exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                        |
                                                        >>>>>

trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000

                                                        |
                                                        >>>>>

trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000

                                                        |
                                                        >>>>>

trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                        root@stor1:~# du
                                                        -sh

                                                        |
                                                        >>>>>

                                                        /exports/fast-test/150G/images/127/

                                                        |
                                                        >>>>>

                                                            1.2G
                                                         /exports/fast-test/150G/images/127/

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                            stor2

                                                        |
                                                        >>>>>

                                                            # file:

                                                        |
                                                        >>>>>

exports/fast-test/150G/images/127/vm-127-disk-1.qcow2

                                                        |
                                                        >>>>>

trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000

                                                        |
                                                        >>>>>

trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000

                                                        |
                                                        >>>>>

trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                        root@stor2:~# du
                                                        -sh

                                                        |
                                                        >>>>>

                                                        /exports/fast-test/150G/images/127/

                                                        |
                                                        >>>>>

                                                            1.4G
                                                         /exports/fast-test/150G/images/127/

                                                        |
                                                        >>>>

                                                            According to
                                                        the changelogs,
                                                        the file doesn't

                                                        |
                                                        >>>>

                                                            need any
                                                        healing. Could
                                                        you stop the
                                                        operations

                                                        |
                                                        >>>>

                                                            on the VMs
                                                        and take md5sum
                                                        on both these
                                                        machines?

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>

                                                            Pranith

                                                        |
                                                        >>>>

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                            2014-08-05
                                                        11:49 GMT+03:00
                                                        Pranith Kumar

                                                        |
                                                        >>>>>

                                                            Karampuri
                                                        <pkarampu@xxxxxxxxxx

                                                    |
                                                    >>>>>

                                                    <mailto:pkarampu@xxxxxxxxxx>>:

                                                      |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                        |
                                                        >>>>>

                                                                On
                                                        08/05/2014 02:06
                                                        PM, Roman wrote:

                                                        |
                                                        >>>>>>

                                                                Well, it
                                                        seems like it
                                                        doesn't see the

                                                        |
                                                        >>>>>>

                                                                changes
                                                        were made to the
                                                        volume ? I

                                                        |
                                                        >>>>>>

                                                                created
                                                        two files 200
                                                        and 100 MB (from

                                                        |
                                                        >>>>>>

                                                        /dev/zero) after
                                                        I disconnected
                                                        the first

                                                        |
                                                        >>>>>>

                                                                brick.
                                                        Then connected
                                                        it back and got

                                                        |
                                                        >>>>>>

                                                                these
                                                        logs:

                                                        |
                                                        >>>>>>

                                                        |
                                                        >>>>>>

                                                        [2014-08-05
                                                        08:30:37.830150]
                                                        I

                                                        |
                                                        >>>>>>

                                                        [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]

                                                        |
                                                        >>>>>>

                                                        0-glusterfs: No
                                                        change in
                                                        volfile,
                                                        continuing

                                                        |
                                                        >>>>>>

                                                        [2014-08-05
                                                        08:30:37.830207]
                                                        I

                                                        |
                                                        >>>>>>

                                                        [rpc-clnt.c:1676:rpc_clnt_reconfig]

                                                        |
                                                        >>>>>>

                                                        0-HA-fast-150G-PVE1-client-0:
                                                        changing

                                                        |
                                                        >>>>>>

                                                                port to
                                                        49153 (from 0)

                                                        |
                                                        >>>>>>

                                                        [2014-08-05
                                                        08:30:37.830239]
                                                        W

                                                        |
                                                        >>>>>>

                                                        [socket.c:514:__socket_rwv]

                                                        |
                                                        >>>>>>

                                                        0-HA-fast-150G-PVE1-client-0:
                                                        readv

                                                        |
                                                        >>>>>>

                                                                failed
                                                        (No data
                                                        available)

                                                        |
                                                        >>>>>>

                                                        [2014-08-05
                                                        08:30:37.831024]
                                                        I

                                                        |
                                                        >>>>>>

                                                        [client-handshake.c:1659:select_server_supported_programs]

                                                        |
                                                        >>>>>>

                                                        0-HA-fast-150G-PVE1-client-0:
                                                        Using

                                                        |
                                                        >>>>>>

                                                                Program
                                                        GlusterFS 3.3,
                                                        Num (1298437),

                                                        |
                                                        >>>>>>

                                                                Version
                                                        (330)

                                                        |
                                                        >>>>>>

                                                        [2014-08-05
                                                        08:30:37.831375]
                                                        I

                                                        |
                                                        >>>>>>

                                                        [client-handshake.c:1456:client_setvolume_cbk]

                                                        |
                                                        >>>>>>

                                                        0-HA-fast-150G-PVE1-client-0:
                                                        Connected

                                                        |
                                                        >>>>>>

                                                                to 10.250.0.1:49153

                                                    |
                                                    >>>>>>

                                                        <http://10.250.0.1:49153>,

                                                    attached to

                                                      |
                                                        >>>>>>

                                                                remote
                                                        volume
                                                        '/exports/fast-test/150G'.

                                                        |
                                                        >>>>>>

                                                        [2014-08-05
                                                        08:30:37.831394]
                                                        I

                                                        |
                                                        >>>>>>

                                                        [client-handshake.c:1468:client_setvolume_cbk]

                                                        |
                                                        >>>>>>

                                                        0-HA-fast-150G-PVE1-client-0:
                                                        Server and

                                                        |
                                                        >>>>>>

                                                                Client
                                                        lk-version
                                                        numbers are not
                                                        same,

                                                        |
                                                        >>>>>>

                                                        reopening the
                                                        fds

                                                        |
                                                        >>>>>>

                                                        [2014-08-05
                                                        08:30:37.831566]
                                                        I

                                                        |
                                                        >>>>>>

                                                        [client-handshake.c:450:client_set_lk_version_cbk]

                                                        |
                                                        >>>>>>

                                                        0-HA-fast-150G-PVE1-client-0:
                                                        Server lk

                                                        |
                                                        >>>>>>

                                                                version
                                                        = 1

                                                        |
                                                        >>>>>>

                                                        |
                                                        >>>>>>

                                                        |
                                                        >>>>>>

                                                        [2014-08-05
                                                        08:30:37.830150]
                                                        I

                                                        |
                                                        >>>>>>

                                                        [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]

                                                        |
                                                        >>>>>>

                                                        0-glusterfs: No
                                                        change in
                                                        volfile,
                                                        continuing

                                                        |
                                                        >>>>>>

                                                                this
                                                        line seems weird
                                                        to me tbh.

                                                        |
                                                        >>>>>>

                                                                I do not
                                                        see any traffic
                                                        on switch

                                                        |
                                                        >>>>>>

                                                        interfaces
                                                        between gluster
                                                        servers, which

...

[Письмо показано не полностью]  

-- 
Best regards,
Roman.

-- 
Best regards,
Roman.

-- 
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users