Re: libgfapi failover problem on replica bricks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just to be sure: why do you guys create an updated version of glusterfs package for wheezy, if it is not able to install it on wheezy? :)


2014-08-08 9:03 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:
Oh, unfortunately I won't be able to install 3.5.2 nor 3.4.5 :( They both require libc6 update. I would not risk that way.

 glusterfs-common : Depends: libc6 (>= 2.14) but 2.13-38+deb7u3 is to be installed
                    Depends: liblvm2app2.2 (>= 2.02.106) but 2.02.95-8 is to be installed
                    Depends: librdmacm1 (>= 1.0.16) but 1.0.15-1+deb7u1 is to be installed



2014-08-07 15:32 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:
I'm really sorry to bother, but it seems like all my previous test were waste of time with those generated from /dev/zero files :). Its good and bad news. Now I use real files for my tests. As it my almost last workday, only things I prefer to do is to test and document :) .. so here are some new results:

So this time I've got two gluster volumes:

1. with cluster.self-heal-daemon off
2. with cluster.self-heal-daemon on

1. real results with SHD off:
Seems like all is working as expected. VM survives both glusterfs servers outage. And I'm able to see the sync via network traffic. FINE! 

Sometimes healing occurs a bit late (takes time from 1 minute to 1 hour to sync). Don't know why. Ideas?

2. test results on server with SHD on:
VM is not able to survive second server restart (as was previously defined). gives IO errors, Although files are synced. Some locks, that do not allow KVM hypervisor to reconnect to the storage in time?


So the problem actually is stripped files inside a VM :). If one uses them (generates from /dev/zero ie), VM will crash and never come up due to errors in qcow2 file headers. Another bug?







2014-08-07 9:53 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:
Ok, then I hope that we will be able to test it two weeks later. Thanks for your time and  patience. 


2014-08-07 9:49 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

On 08/07/2014 12:17 PM, Roman wrote:
Well, one thing is definitely true: If there is no healing daemon running, I'm not able to start the VM after outage. Seems like the qcow2 file is corrupted (KVM unable to read its header).
We shall see this again once I have the document with all the steps that need to be carried out :-)

Pranith


2014-08-07 9:35 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:
This should not happen if you do the writes lets say from '/dev/urandom' instead of '/dev/zero'

Somewhere deep inside me I thought so ! zero is zero :)

>I will provide you with a document for testing this issue properly. I have a lot going on in my day job so not getting enough time to write that out. Considering the weekend is approaching I will > get a bit of time definitely over the weekend so I will send you the document over the weekend.

Thank you a lot. I'll wait. Tomorrow starts my vacation and I'll be out for two weeks, so don't hurry very much. 




2014-08-07 9:26 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

On 08/07/2014 11:48 AM, Roman wrote:
How can they be in sync, if they are different in size ? And why then VM is not able to survive gluster outage? I really want to use glusterfs in our production for infrastructure virtualization due to its simple setup, but I'm not able to at this moment. Maybe you've got some testing agenda? Or could you list me the steps to make right tests, so our VM-s would survive the outages.
This is because of sparse files. http://en.wikipedia.org/wiki/Sparse_file
This should not happen if you do the writes lets say from '/dev/urandom' instead of '/dev/zero'

I will provide you with a document for testing this issue properly. I have a lot going on in my day job so not getting enough time to write that out. Considering the weekend is approaching I will get a bit of time definitely over the weekend so I will send you the document over the weekend.

Pranith

We would like to be sure, that in situation, when one of storages is down, the VM-s are running - it is OK, we see this.
We would like to be sure, that data after the server is back up is synced - we can't see that atm
We would like to be sure, that VMs are failovering to the second storage during the outage - we can't see this atm 
:(


2014-08-07 9:12 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

On 08/07/2014 11:33 AM, Roman wrote:
File size increases because of me :) I generate files on VM from /dev/zero during the outage of one server. Then I bring up the downed server and it seems files never sync. I'll keep on testing today. Can't read much from logs also :(. This morning both VM-s (one on volume with self-healing and other on volume without it) survived second server outage (first server was down yesterday), while file sizes are different, VM-s ran without problems. But I've restarted them before bringing the second gluster server down.
Then there is no bug :-). It seems the files are already in sync according to the extended attributes you have pasted. How to do you test if the files are in sync or not?

Pranith

So I'm a bit lost at this moment. I'll try to keep my testings ordered and write here, what will happen.


2014-08-07 8:29 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

On 08/07/2014 10:46 AM, Roman wrote:
yes, they do.

getfattr: Removing leading '/' from absolute path names
# file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa

root@stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
root@stor1:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
c117d73c9f8a2e09ef13da31f7225fa6  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
root@stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2



root@stor2:~# getfattr -d -m. -e hex /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
getfattr: Removing leading '/' from absolute path names
# file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa

root@stor2:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
c117d73c9f8a2e09ef13da31f7225fa6  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
root@stor2:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
2.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
I think the files are differing in size because of the sparse file healing issue. Could you raise a bug with steps to re-create this issue where after healing size of the file is increasing?

Pranith




2014-08-06 12:49 GMT+03:00 Humble Chirammal <hchiramm@xxxxxxxxxx>:



----- Original Message -----
| From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
| To: "Roman" <romeo.r@xxxxxxxxx>
| Cc: gluster-users@xxxxxxxxxxx, "Niels de Vos" <ndevos@xxxxxxxxxx>, "Humble Chirammal" <hchiramm@xxxxxxxxxx>
| Sent: Wednesday, August 6, 2014 12:09:57 PM
| Subject: Re: libgfapi failover problem on replica bricks
|
| Roman,
|      The file went into split-brain. I think we should do these tests
| with 3.5.2. Where monitoring the heals is easier. Let me also come up
| with a document about how to do this testing you are trying to do.
|
| Humble/Niels,
|      Do we have debs available for 3.5.2? In 3.5.1 there was packaging
| issue where /usr/bin/glfsheal is not packaged along with the deb. I
| think that should be fixed now as well?
|
Pranith,

The 3.5.2 packages for debian is not available yet. We are co-ordinating internally to get it processed.
I will update the list once its available.

--Humble
|
| On 08/06/2014 11:52 AM, Roman wrote:
| > good morning,
| >
| > root@stor1:~# getfattr -d -m. -e hex
| > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > getfattr: Removing leading '/' from absolute path names
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
| > trusted.gfid=0x23c79523075a4158bea38078da570449
| >
| > getfattr: Removing leading '/' from absolute path names
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| > trusted.gfid=0x23c79523075a4158bea38078da570449
| >
| >
| >
| > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx
| > <mailto:pkarampu@xxxxxxxxxx>>:
| >
| >
| >     On 08/06/2014 11:30 AM, Roman wrote:
| >>     Also, this time files are not the same!
| >>
| >>     root@stor1:~# md5sum
| >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>     32411360c53116b96a059f17306caeda
| >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>
| >>     root@stor2:~# md5sum
| >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>     65b8a6031bcb6f5fb3a11cb1e8b1c9c9
| >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >     What is the getfattr output?
| >
| >     Pranith
| >
| >>
| >>
| >>     2014-08-05 16:33 GMT+03:00 Roman <romeo.r@xxxxxxxxx
| >>     <mailto:romeo.r@xxxxxxxxx>>:
| >>
| >>         Nope, it is not working. But this time it went a bit other way
| >>
| >>         root@gluster-client:~# dmesg
| >>         Segmentation fault
| >>
| >>
| >>         I was not able even to start the VM after I done the tests
| >>
| >>         Could not read qcow2 header: Operation not permitted
| >>
| >>         And it seems, it never starts to sync files after first
| >>         disconnect. VM survives first disconnect, but not second (I
| >>         waited around 30 minutes). Also, I've
| >>         got network.ping-timeout: 2 in volume settings, but logs
| >>         react on first disconnect around 30 seconds. Second was
| >>         faster, 2 seconds.
| >>
| >>         Reaction was different also:
| >>
| >>         slower one:
| >>         [2014-08-05 13:26:19.558435] W [socket.c:514:__socket_rwv]
| >>         0-glusterfs: readv failed (Connection timed out)
| >>         [2014-08-05 13:26:19.558485] W
| >>         [socket.c:1962:__socket_proto_state_machine] 0-glusterfs:
| >>         reading from socket failed. Error (Connection timed out),
| >>         peer (10.250.0.1:24007 <http://10.250.0.1:24007>)
| >>         [2014-08-05 13:26:21.281426] W [socket.c:514:__socket_rwv]
| >>         0-HA-fast-150G-PVE1-client-0: readv failed (Connection timed out)
| >>         [2014-08-05 13:26:21.281474] W
| >>         [socket.c:1962:__socket_proto_state_machine]
| >>         0-HA-fast-150G-PVE1-client-0: reading from socket failed.
| >>         Error (Connection timed out), peer (10.250.0.1:49153
| >>         <http://10.250.0.1:49153>)
| >>         [2014-08-05 13:26:21.281507] I
| >>         [client.c:2098:client_rpc_notify]
| >>         0-HA-fast-150G-PVE1-client-0: disconnected
| >>
| >>         the fast one:
| >>         2014-08-05 12:52:44.607389] C
| >>         [client-handshake.c:127:rpc_client_ping_timer_expired]
| >>         0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
| >>         <http://10.250.0.2:49153> has not responded in the last 2
| >>         seconds, disconnecting.
| >>         [2014-08-05 12:52:44.607491] W [socket.c:514:__socket_rwv]
| >>         0-HA-fast-150G-PVE1-client-1: readv failed (No data available)
| >>         [2014-08-05 12:52:44.607585] E
| >>         [rpc-clnt.c:368:saved_frames_unwind]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
| >>         [0x7fcb1b4b0558]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
| >>         [0x7fcb1b4aea63]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
| >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
| >>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
| >>         2014-08-05 12:52:42.463881 (xid=0x381883x)
| >>         [2014-08-05 12:52:44.607604] W
| >>         [client-rpc-fops.c:2624:client3_3_lookup_cbk]
| >>         0-HA-fast-150G-PVE1-client-1: remote operation failed:
| >>         Transport endpoint is not connected. Path: /
| >>         (00000000-0000-0000-0000-000000000001)
| >>         [2014-08-05 12:52:44.607736] E
| >>         [rpc-clnt.c:368:saved_frames_unwind]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
| >>         [0x7fcb1b4b0558]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
| >>         [0x7fcb1b4aea63]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
| >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
| >>         unwinding frame type(GlusterFS Handshake) op(PING(3)) called
| >>         at 2014-08-05 12:52:42.463891 (xid=0x381884x)
| >>         [2014-08-05 12:52:44.607753] W
| >>         [client-handshake.c:276:client_ping_cbk]
| >>         0-HA-fast-150G-PVE1-client-1: timer must have expired
| >>         [2014-08-05 12:52:44.607776] I
| >>         [client.c:2098:client_rpc_notify]
| >>         0-HA-fast-150G-PVE1-client-1: disconnected
| >>
| >>
| >>
| >>         I've got SSD disks (just for an info).
| >>         Should I go and give a try for 3.5.2?
| >>
| >>
| >>
| >>         2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
| >>         <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >>
| >>             reply along with gluster-users please :-). May be you are
| >>             hitting 'reply' instead of 'reply all'?
| >>
| >>             Pranith
| >>
| >>             On 08/05/2014 03:35 PM, Roman wrote:
| >>>             To make sure and clean, I've created another VM with raw
| >>>             format and goint to repeat those steps. So now I've got
| >>>             two VM-s one with qcow2 format and other with raw
| >>>             format. I will send another e-mail shortly.
| >>>
| >>>
| >>>             2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
| >>>             <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >>>
| >>>
| >>>                 On 08/05/2014 03:07 PM, Roman wrote:
| >>>>                 really, seems like the same file
| >>>>
| >>>>                 stor1:
| >>>>                 a951641c5230472929836f9fcede6b04
| >>>>                  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>
| >>>>                 stor2:
| >>>>                 a951641c5230472929836f9fcede6b04
| >>>>                  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>
| >>>>
| >>>>                 one thing I've seen from logs, that somehow proxmox
| >>>>                 VE is connecting with wrong version to servers?
| >>>>                 [2014-08-05 09:23:45.218550] I
| >>>>                 [client-handshake.c:1659:select_server_supported_programs]
| >>>>                 0-HA-fast-150G-PVE1-client-0: Using Program
| >>>>                 GlusterFS 3.3, Num (1298437), Version (330)
| >>>                 It is the rpc (over the network data structures)
| >>>                 version, which is not changed at all from 3.3 so
| >>>                 thats not a problem. So what is the conclusion? Is
| >>>                 your test case working now or not?
| >>>
| >>>                 Pranith
| >>>
| >>>>                 but if I issue:
| >>>>                 root@pve1:~# glusterfs -V
| >>>>                 glusterfs 3.4.4 built on Jun 28 2014 03:44:57
| >>>>                 seems ok.
| >>>>
| >>>>                 server  use 3.4.4 meanwhile
| >>>>                 [2014-08-05 09:23:45.117875] I
| >>>>                 [server-handshake.c:567:server_setvolume]
| >>>>                 0-HA-fast-150G-PVE1-server: accepted client from
| >>>>                 stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
| >>>>                 (version: 3.4.4)
| >>>>                 [2014-08-05 09:23:49.103035] I
| >>>>                 [server-handshake.c:567:server_setvolume]
| >>>>                 0-HA-fast-150G-PVE1-server: accepted client from
| >>>>                 stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
| >>>>                 (version: 3.4.4)
| >>>>
| >>>>                 if this could be the reason, of course.
| >>>>                 I did restart the Proxmox VE yesterday (just for an
| >>>>                 information)
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>                 2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri
| >>>>                 <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>
| >>>>
| >>>>                     On 08/05/2014 02:33 PM, Roman wrote:
| >>>>>                     Waited long enough for now, still different
| >>>>>                     sizes and no logs about healing :(
| >>>>>
| >>>>>                     stor1
| >>>>>                     # file:
| >>>>>                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| >>>>>                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
| >>>>>
| >>>>>                     root@stor1:~# du -sh
| >>>>>                     /exports/fast-test/150G/images/127/
| >>>>>                     1.2G  /exports/fast-test/150G/images/127/
| >>>>>
| >>>>>
| >>>>>                     stor2
| >>>>>                     # file:
| >>>>>                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| >>>>>                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
| >>>>>
| >>>>>
| >>>>>                     root@stor2:~# du -sh
| >>>>>                     /exports/fast-test/150G/images/127/
| >>>>>                     1.4G  /exports/fast-test/150G/images/127/
| >>>>                     According to the changelogs, the file doesn't
| >>>>                     need any healing. Could you stop the operations
| >>>>                     on the VMs and take md5sum on both these machines?
| >>>>
| >>>>                     Pranith
| >>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>>                     2014-08-05 11:49 GMT+03:00 Pranith Kumar
| >>>>>                     Karampuri <pkarampu@xxxxxxxxxx
| >>>>>                     <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>
| >>>>>
| >>>>>                         On 08/05/2014 02:06 PM, Roman wrote:
| >>>>>>                         Well, it seems like it doesn't see the
| >>>>>>                         changes were made to the volume ? I
| >>>>>>                         created two files 200 and 100 MB (from
| >>>>>>                         /dev/zero) after I disconnected the first
| >>>>>>                         brick. Then connected it back and got
| >>>>>>                         these logs:
| >>>>>>
| >>>>>>                         [2014-08-05 08:30:37.830150] I
| >>>>>>                         [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
| >>>>>>                         0-glusterfs: No change in volfile, continuing
| >>>>>>                         [2014-08-05 08:30:37.830207] I
| >>>>>>                         [rpc-clnt.c:1676:rpc_clnt_reconfig]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: changing
| >>>>>>                         port to 49153 (from 0)
| >>>>>>                         [2014-08-05 08:30:37.830239] W
| >>>>>>                         [socket.c:514:__socket_rwv]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: readv
| >>>>>>                         failed (No data available)
| >>>>>>                         [2014-08-05 08:30:37.831024] I
| >>>>>>                         [client-handshake.c:1659:select_server_supported_programs]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Using
| >>>>>>                         Program GlusterFS 3.3, Num (1298437),
| >>>>>>                         Version (330)
| >>>>>>                         [2014-08-05 08:30:37.831375] I
| >>>>>>                         [client-handshake.c:1456:client_setvolume_cbk]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Connected
| >>>>>>                         to 10.250.0.1:49153
| >>>>>>                         <http://10.250.0.1:49153>, attached to
| >>>>>>                         remote volume '/exports/fast-test/150G'.
| >>>>>>                         [2014-08-05 08:30:37.831394] I
| >>>>>>                         [client-handshake.c:1468:client_setvolume_cbk]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Server and
| >>>>>>                         Client lk-version numbers are not same,
| >>>>>>                         reopening the fds
| >>>>>>                         [2014-08-05 08:30:37.831566] I
| >>>>>>                         [client-handshake.c:450:client_set_lk_version_cbk]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Server lk
| >>>>>>                         version = 1
| >>>>>>
| >>>>>>
| >>>>>>                         [2014-08-05 08:30:37.830150] I
| >>>>>>                         [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
| >>>>>>                         0-glusterfs: No change in volfile, continuing
| >>>>>>                         this line seems weird to me tbh.
| >>>>>>                         I do not see any traffic on switch
| >>>>>>                         interfaces between gluster servers, which
...

[Письмо показано не полностью]  



--
Best regards,
Roman.



--
Best regards,
Roman.



--
Best regards,
Roman.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux