Re: libgfapi failover problem on replica bricks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Well, was not much to test. After I downed the first server again (Both servers were up for one hour, so it is enough to sync, if it would ever start) I've got the segmentation fault on both VM-s.

So to me it seems like issue with healing the stripped files. It just never starts and I don't really know why.

what do I do:
1. create VM on replicated volume with stripped disk.
2. wait for zero traffic on ports of the switch, where my gluster servers are
3. bring one port down / stop glusterd (and kill all gluster* processes) / reboot one server
4. generate files on VM from /dev/zero (5-6 files of 100-500 MB size)
5. bring the downed server up 
6. waiting for the sync, that seems like never start or it starts on wrong server (Sometimes I do really see some traffic for some time, about 5-10 minuts around 15 Mbps/sec on the server that was not down, which means: the server1 was down, but after it comes up, there is traffic between server2 and Proxmox VE host)
7. waiting a bit more to be sure
8. bring the other server down
9. Segfault.


2014-08-07 9:03 GMT+03:00 Roman <romeo.r@xxxxxxxxx>:
File size increases because of me :) I generate files on VM from /dev/zero during the outage of one server. Then I bring up the downed server and it seems files never sync. I'll keep on testing today. Can't read much from logs also :(. This morning both VM-s (one on volume with self-healing and other on volume without it) survived second server outage (first server was down yesterday), while file sizes are different, VM-s ran without problems. But I've restarted them before bringing the second gluster server down. 

So I'm a bit lost at this moment. I'll try to keep my testings ordered and write here, what will happen.


2014-08-07 8:29 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:


On 08/07/2014 10:46 AM, Roman wrote:
yes, they do.

getfattr: Removing leading '/' from absolute path names
# file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa

root@stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
root@stor1:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
c117d73c9f8a2e09ef13da31f7225fa6  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
root@stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2



root@stor2:~# getfattr -d -m. -e hex /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
getfattr: Removing leading '/' from absolute path names
# file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa

root@stor2:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
c117d73c9f8a2e09ef13da31f7225fa6  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
root@stor2:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
2.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
I think the files are differing in size because of the sparse file healing issue. Could you raise a bug with steps to re-create this issue where after healing size of the file is increasing?

Pranith




2014-08-06 12:49 GMT+03:00 Humble Chirammal <hchiramm@xxxxxxxxxx>:



----- Original Message -----
| From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
| To: "Roman" <romeo.r@xxxxxxxxx>
| Cc: gluster-users@xxxxxxxxxxx, "Niels de Vos" <ndevos@xxxxxxxxxx>, "Humble Chirammal" <hchiramm@xxxxxxxxxx>
| Sent: Wednesday, August 6, 2014 12:09:57 PM
| Subject: Re: libgfapi failover problem on replica bricks
|
| Roman,
|      The file went into split-brain. I think we should do these tests
| with 3.5.2. Where monitoring the heals is easier. Let me also come up
| with a document about how to do this testing you are trying to do.
|
| Humble/Niels,
|      Do we have debs available for 3.5.2? In 3.5.1 there was packaging
| issue where /usr/bin/glfsheal is not packaged along with the deb. I
| think that should be fixed now as well?
|
Pranith,

The 3.5.2 packages for debian is not available yet. We are co-ordinating internally to get it processed.
I will update the list once its available.

--Humble
|
| On 08/06/2014 11:52 AM, Roman wrote:
| > good morning,
| >
| > root@stor1:~# getfattr -d -m. -e hex
| > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > getfattr: Removing leading '/' from absolute path names
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
| > trusted.gfid=0x23c79523075a4158bea38078da570449
| >
| > getfattr: Removing leading '/' from absolute path names
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| > trusted.gfid=0x23c79523075a4158bea38078da570449
| >
| >
| >
| > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx
| > <mailto:pkarampu@xxxxxxxxxx>>:
| >
| >
| >     On 08/06/2014 11:30 AM, Roman wrote:
| >>     Also, this time files are not the same!
| >>
| >>     root@stor1:~# md5sum
| >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>     32411360c53116b96a059f17306caeda
| >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>
| >>     root@stor2:~# md5sum
| >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>     65b8a6031bcb6f5fb3a11cb1e8b1c9c9
| >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >     What is the getfattr output?
| >
| >     Pranith
| >
| >>
| >>
| >>     2014-08-05 16:33 GMT+03:00 Roman <romeo.r@xxxxxxxxx
| >>     <mailto:romeo.r@xxxxxxxxx>>:
| >>
| >>         Nope, it is not working. But this time it went a bit other way
| >>
| >>         root@gluster-client:~# dmesg
| >>         Segmentation fault
| >>
| >>
| >>         I was not able even to start the VM after I done the tests
| >>
| >>         Could not read qcow2 header: Operation not permitted
| >>
| >>         And it seems, it never starts to sync files after first
| >>         disconnect. VM survives first disconnect, but not second (I
| >>         waited around 30 minutes). Also, I've
| >>         got network.ping-timeout: 2 in volume settings, but logs
| >>         react on first disconnect around 30 seconds. Second was
| >>         faster, 2 seconds.
| >>
| >>         Reaction was different also:
| >>
| >>         slower one:
| >>         [2014-08-05 13:26:19.558435] W [socket.c:514:__socket_rwv]
| >>         0-glusterfs: readv failed (Connection timed out)
| >>         [2014-08-05 13:26:19.558485] W
| >>         [socket.c:1962:__socket_proto_state_machine] 0-glusterfs:
| >>         reading from socket failed. Error (Connection timed out),
| >>         peer (10.250.0.1:24007 <http://10.250.0.1:24007>)
| >>         [2014-08-05 13:26:21.281426] W [socket.c:514:__socket_rwv]
| >>         0-HA-fast-150G-PVE1-client-0: readv failed (Connection timed out)
| >>         [2014-08-05 13:26:21.281474] W
| >>         [socket.c:1962:__socket_proto_state_machine]
| >>         0-HA-fast-150G-PVE1-client-0: reading from socket failed.
| >>         Error (Connection timed out), peer (10.250.0.1:49153
| >>         <http://10.250.0.1:49153>)
| >>         [2014-08-05 13:26:21.281507] I
| >>         [client.c:2098:client_rpc_notify]
| >>         0-HA-fast-150G-PVE1-client-0: disconnected
| >>
| >>         the fast one:
| >>         2014-08-05 12:52:44.607389] C
| >>         [client-handshake.c:127:rpc_client_ping_timer_expired]
| >>         0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
| >>         <http://10.250.0.2:49153> has not responded in the last 2
| >>         seconds, disconnecting.
| >>         [2014-08-05 12:52:44.607491] W [socket.c:514:__socket_rwv]
| >>         0-HA-fast-150G-PVE1-client-1: readv failed (No data available)
| >>         [2014-08-05 12:52:44.607585] E
| >>         [rpc-clnt.c:368:saved_frames_unwind]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
| >>         [0x7fcb1b4b0558]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
| >>         [0x7fcb1b4aea63]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
| >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
| >>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
| >>         2014-08-05 12:52:42.463881 (xid=0x381883x)
| >>         [2014-08-05 12:52:44.607604] W
| >>         [client-rpc-fops.c:2624:client3_3_lookup_cbk]
| >>         0-HA-fast-150G-PVE1-client-1: remote operation failed:
| >>         Transport endpoint is not connected. Path: /
| >>         (00000000-0000-0000-0000-000000000001)
| >>         [2014-08-05 12:52:44.607736] E
| >>         [rpc-clnt.c:368:saved_frames_unwind]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
| >>         [0x7fcb1b4b0558]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
| >>         [0x7fcb1b4aea63]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
| >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
| >>         unwinding frame type(GlusterFS Handshake) op(PING(3)) called
| >>         at 2014-08-05 12:52:42.463891 (xid=0x381884x)
| >>         [2014-08-05 12:52:44.607753] W
| >>         [client-handshake.c:276:client_ping_cbk]
| >>         0-HA-fast-150G-PVE1-client-1: timer must have expired
| >>         [2014-08-05 12:52:44.607776] I
| >>         [client.c:2098:client_rpc_notify]
| >>         0-HA-fast-150G-PVE1-client-1: disconnected
| >>
| >>
| >>
| >>         I've got SSD disks (just for an info).
| >>         Should I go and give a try for 3.5.2?
| >>
| >>
| >>
| >>         2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
| >>         <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >>
| >>             reply along with gluster-users please :-). May be you are
| >>             hitting 'reply' instead of 'reply all'?
| >>
| >>             Pranith
| >>
| >>             On 08/05/2014 03:35 PM, Roman wrote:
| >>>             To make sure and clean, I've created another VM with raw
| >>>             format and goint to repeat those steps. So now I've got
| >>>             two VM-s one with qcow2 format and other with raw
| >>>             format. I will send another e-mail shortly.
| >>>
| >>>
| >>>             2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
| >>>             <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >>>
| >>>
| >>>                 On 08/05/2014 03:07 PM, Roman wrote:
| >>>>                 really, seems like the same file
| >>>>
| >>>>                 stor1:
| >>>>                 a951641c5230472929836f9fcede6b04
| >>>>                  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>
| >>>>                 stor2:
| >>>>                 a951641c5230472929836f9fcede6b04
| >>>>                  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>
| >>>>
| >>>>                 one thing I've seen from logs, that somehow proxmox
| >>>>                 VE is connecting with wrong version to servers?
| >>>>                 [2014-08-05 09:23:45.218550] I
| >>>>                 [client-handshake.c:1659:select_server_supported_programs]
| >>>>                 0-HA-fast-150G-PVE1-client-0: Using Program
| >>>>                 GlusterFS 3.3, Num (1298437), Version (330)
| >>>                 It is the rpc (over the network data structures)
| >>>                 version, which is not changed at all from 3.3 so
| >>>                 thats not a problem. So what is the conclusion? Is
| >>>                 your test case working now or not?
| >>>
| >>>                 Pranith
| >>>
| >>>>                 but if I issue:
| >>>>                 root@pve1:~# glusterfs -V
| >>>>                 glusterfs 3.4.4 built on Jun 28 2014 03:44:57
| >>>>                 seems ok.
| >>>>
| >>>>                 server  use 3.4.4 meanwhile
| >>>>                 [2014-08-05 09:23:45.117875] I
| >>>>                 [server-handshake.c:567:server_setvolume]
| >>>>                 0-HA-fast-150G-PVE1-server: accepted client from
| >>>>                 stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
| >>>>                 (version: 3.4.4)
| >>>>                 [2014-08-05 09:23:49.103035] I
| >>>>                 [server-handshake.c:567:server_setvolume]
| >>>>                 0-HA-fast-150G-PVE1-server: accepted client from
| >>>>                 stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
| >>>>                 (version: 3.4.4)
| >>>>
| >>>>                 if this could be the reason, of course.
| >>>>                 I did restart the Proxmox VE yesterday (just for an
| >>>>                 information)
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>                 2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri
| >>>>                 <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>
| >>>>
| >>>>                     On 08/05/2014 02:33 PM, Roman wrote:
| >>>>>                     Waited long enough for now, still different
| >>>>>                     sizes and no logs about healing :(
| >>>>>
| >>>>>                     stor1
| >>>>>                     # file:
| >>>>>                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| >>>>>                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
| >>>>>
| >>>>>                     root@stor1:~# du -sh
| >>>>>                     /exports/fast-test/150G/images/127/
| >>>>>                     1.2G  /exports/fast-test/150G/images/127/
| >>>>>
| >>>>>
| >>>>>                     stor2
| >>>>>                     # file:
| >>>>>                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| >>>>>                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
| >>>>>
| >>>>>
| >>>>>                     root@stor2:~# du -sh
| >>>>>                     /exports/fast-test/150G/images/127/
| >>>>>                     1.4G  /exports/fast-test/150G/images/127/
| >>>>                     According to the changelogs, the file doesn't
| >>>>                     need any healing. Could you stop the operations
| >>>>                     on the VMs and take md5sum on both these machines?
| >>>>
| >>>>                     Pranith
| >>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>>                     2014-08-05 11:49 GMT+03:00 Pranith Kumar
| >>>>>                     Karampuri <pkarampu@xxxxxxxxxx
| >>>>>                     <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>
| >>>>>
| >>>>>                         On 08/05/2014 02:06 PM, Roman wrote:
| >>>>>>                         Well, it seems like it doesn't see the
| >>>>>>                         changes were made to the volume ? I
| >>>>>>                         created two files 200 and 100 MB (from
| >>>>>>                         /dev/zero) after I disconnected the first
| >>>>>>                         brick. Then connected it back and got
| >>>>>>                         these logs:
| >>>>>>
| >>>>>>                         [2014-08-05 08:30:37.830150] I
| >>>>>>                         [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
| >>>>>>                         0-glusterfs: No change in volfile, continuing
| >>>>>>                         [2014-08-05 08:30:37.830207] I
| >>>>>>                         [rpc-clnt.c:1676:rpc_clnt_reconfig]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: changing
| >>>>>>                         port to 49153 (from 0)
| >>>>>>                         [2014-08-05 08:30:37.830239] W
| >>>>>>                         [socket.c:514:__socket_rwv]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: readv
| >>>>>>                         failed (No data available)
| >>>>>>                         [2014-08-05 08:30:37.831024] I
| >>>>>>                         [client-handshake.c:1659:select_server_supported_programs]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Using
| >>>>>>                         Program GlusterFS 3.3, Num (1298437),
| >>>>>>                         Version (330)
| >>>>>>                         [2014-08-05 08:30:37.831375] I
| >>>>>>                         [client-handshake.c:1456:client_setvolume_cbk]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Connected
| >>>>>>                         to 10.250.0.1:49153
| >>>>>>                         <http://10.250.0.1:49153>, attached to
| >>>>>>                         remote volume '/exports/fast-test/150G'.
| >>>>>>                         [2014-08-05 08:30:37.831394] I
| >>>>>>                         [client-handshake.c:1468:client_setvolume_cbk]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Server and
| >>>>>>                         Client lk-version numbers are not same,
| >>>>>>                         reopening the fds
| >>>>>>                         [2014-08-05 08:30:37.831566] I
| >>>>>>                         [client-handshake.c:450:client_set_lk_version_cbk]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Server lk
| >>>>>>                         version = 1
| >>>>>>
| >>>>>>
| >>>>>>                         [2014-08-05 08:30:37.830150] I
| >>>>>>                         [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
| >>>>>>                         0-glusterfs: No change in volfile, continuing
| >>>>>>                         this line seems weird to me tbh.
| >>>>>>                         I do not see any traffic on switch
| >>>>>>                         interfaces between gluster servers, which
| >>>>>>                         means, there is no syncing between them.
| >>>>>>                         I tried to ls -l the files on the client
| >>>>>>                         and servers to trigger the healing, but
| >>>>>>                         seems like no success. Should I wait more?
| >>>>>                         Yes, it should take around 10-15 minutes.
| >>>>>                         Could you provide 'getfattr -d -m. -e hex
| >>>>>                         <file-on-brick>' on both the bricks.
| >>>>>
| >>>>>                         Pranith
| >>>>>
| >>>>>>
| >>>>>>
| >>>>>>                         2014-08-05 11:25 GMT+03:00 Pranith Kumar
| >>>>>>                         Karampuri <pkarampu@xxxxxxxxxx
| >>>>>>                         <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>
| >>>>>>
| >>>>>>                             On 08/05/2014 01:10 PM, Roman wrote:
| >>>>>>>                             Ahha! For some reason I was not able
| >>>>>>>                             to start the VM anymore, Proxmox VE
| >>>>>>>                             told me, that it is not able to read
| >>>>>>>                             the qcow2 header due to permission
| >>>>>>>                             is denied for some reason. So I just
| >>>>>>>                             deleted that file and created a new
| >>>>>>>                             VM. And the nex message I've got was
| >>>>>>>                             this:
| >>>>>>                             Seems like these are the messages
| >>>>>>                             where you took down the bricks before
| >>>>>>                             self-heal. Could you restart the run
| >>>>>>                             waiting for self-heals to complete
| >>>>>>                             before taking down the next brick?
| >>>>>>
| >>>>>>                             Pranith
| >>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>>                             [2014-08-05 07:31:25.663412] E
| >>>>>>>                             [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
| >>>>>>>                             0-HA-fast-150G-PVE1-replicate-0:
| >>>>>>>                             Unable to self-heal contents of
| >>>>>>>                             '/images/124/vm-124-disk-1.qcow2'
| >>>>>>>                             (possible split-brain). Please
| >>>>>>>                             delete the file from all but the
| >>>>>>>                             preferred subvolume.- Pending
| >>>>>>>                             matrix:  [ [ 0 60 ] [ 11 0 ] ]
| >>>>>>>                             [2014-08-05 07:31:25.663955] E
| >>>>>>>                             [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
| >>>>>>>                             0-HA-fast-150G-PVE1-replicate-0:
| >>>>>>>                             background  data self-heal failed on
| >>>>>>>                             /images/124/vm-124-disk-1.qcow2
| >>>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>>                             2014-08-05 10:13 GMT+03:00 Pranith
| >>>>>>>                             Kumar Karampuri <pkarampu@xxxxxxxxxx
| >>>>>>>                             <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>
| >>>>>>>                                 I just responded to your earlier
| >>>>>>>                                 mail about how the log looks.
| >>>>>>>                                 The log comes on the mount's logfile
| >>>>>>>
| >>>>>>>                                 Pranith
| >>>>>>>
| >>>>>>>                                 On 08/05/2014 12:41 PM, Roman wrote:
| >>>>>>>>                                 Ok, so I've waited enough, I
| >>>>>>>>                                 think. Had no any traffic on
| >>>>>>>>                                 switch ports between servers.
| >>>>>>>>                                 Could not find any suitable log
| >>>>>>>>                                 message about completed
| >>>>>>>>                                 self-heal (waited about 30
| >>>>>>>>                                 minutes). Plugged out the other
| >>>>>>>>                                 server's UTP cable this time
| >>>>>>>>                                 and got in the same situation:
| >>>>>>>>                                 root@gluster-test1:~# cat
| >>>>>>>>                                 /var/log/dmesg
| >>>>>>>>                                 -bash: /bin/cat: Input/output error
| >>>>>>>>
| >>>>>>>>                                 brick logs:
| >>>>>>>>                                 [2014-08-05 07:09:03.005474] I
| >>>>>>>>                                 [server.c:762:server_rpc_notify]
| >>>>>>>>                                 0-HA-fast-150G-PVE1-server:
| >>>>>>>>                                 disconnecting connectionfrom
| >>>>>>>>                                 pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
| >>>>>>>>                                 [2014-08-05 07:09:03.005530] I
| >>>>>>>>                                 [server-helpers.c:729:server_connection_put]
| >>>>>>>>                                 0-HA-fast-150G-PVE1-server:
| >>>>>>>>                                 Shutting down connection
| >>>>>>>>                                 pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
| >>>>>>>>                                 [2014-08-05 07:09:03.005560] I
| >>>>>>>>                                 [server-helpers.c:463:do_fd_cleanup]
| >>>>>>>>                                 0-HA-fast-150G-PVE1-server: fd
| >>>>>>>>                                 cleanup on
| >>>>>>>>                                 /images/124/vm-124-disk-1.qcow2
| >>>>>>>>                                 [2014-08-05 07:09:03.005797] I
| >>>>>>>>                                 [server-helpers.c:617:server_connection_destroy]
| >>>>>>>>                                 0-HA-fast-150G-PVE1-server:
| >>>>>>>>                                 destroyed connection of
| >>>>>>>>                                 pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>                                 2014-08-05 9:53 GMT+03:00
| >>>>>>>>                                 Pranith Kumar Karampuri
| >>>>>>>>                                 <pkarampu@xxxxxxxxxx
| >>>>>>>>                                 <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>>
| >>>>>>>>                                     Do you think it is possible
| >>>>>>>>                                     for you to do these tests
| >>>>>>>>                                     on the latest version
| >>>>>>>>                                     3.5.2? 'gluster volume heal
| >>>>>>>>                                     <volname> info' would give
| >>>>>>>>                                     you that information in
| >>>>>>>>                                     versions > 3.5.1.
| >>>>>>>>                                     Otherwise you will have to
| >>>>>>>>                                     check it from either the
| >>>>>>>>                                     logs, there will be
| >>>>>>>>                                     self-heal completed message
| >>>>>>>>                                     on the mount logs (or) by
| >>>>>>>>                                     observing 'getfattr -d -m.
| >>>>>>>>                                     -e hex <image-file-on-bricks>'
| >>>>>>>>
| >>>>>>>>                                     Pranith
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>                                     On 08/05/2014 12:09 PM,
| >>>>>>>>                                     Roman wrote:
| >>>>>>>>>                                     Ok, I understand. I will
| >>>>>>>>>                                     try this shortly.
| >>>>>>>>>                                     How can I be sure, that
| >>>>>>>>>                                     healing process is done,
| >>>>>>>>>                                     if I am not able to see
| >>>>>>>>>                                     its status?
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>                                     2014-08-05 9:30 GMT+03:00
| >>>>>>>>>                                     Pranith Kumar Karampuri
| >>>>>>>>>                                     <pkarampu@xxxxxxxxxx
| >>>>>>>>>                                     <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>>>
| >>>>>>>>>                                         Mounts will do the
| >>>>>>>>>                                         healing, not the
| >>>>>>>>>                                         self-heal-daemon. The
| >>>>>>>>>                                         problem I feel is that
| >>>>>>>>>                                         whichever process does
| >>>>>>>>>                                         the healing has the
| >>>>>>>>>                                         latest information
| >>>>>>>>>                                         about the good bricks
| >>>>>>>>>                                         in this usecase. Since
| >>>>>>>>>                                         for VM usecase, mounts
| >>>>>>>>>                                         should have the latest
| >>>>>>>>>                                         information, we should
| >>>>>>>>>                                         let the mounts do the
| >>>>>>>>>                                         healing. If the mount
| >>>>>>>>>                                         accesses the VM image
| >>>>>>>>>                                         either by someone
| >>>>>>>>>                                         doing operations
| >>>>>>>>>                                         inside the VM or
| >>>>>>>>>                                         explicit stat on the
| >>>>>>>>>                                         file it should do the
| >>>>>>>>>                                         healing.
| >>>>>>>>>
| >>>>>>>>>                                         Pranith.
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>                                         On 08/05/2014 10:39
| >>>>>>>>>                                         AM, Roman wrote:
| >>>>>>>>>>                                         Hmmm, you told me to
| >>>>>>>>>>                                         turn it off. Did I
| >>>>>>>>>>                                         understood something
| >>>>>>>>>>                                         wrong? After I issued
| >>>>>>>>>>                                         the command you've
| >>>>>>>>>>                                         sent me, I was not
| >>>>>>>>>>                                         able to watch the
| >>>>>>>>>>                                         healing process, it
| >>>>>>>>>>                                         said, it won't be
| >>>>>>>>>>                                         healed, becouse its
| >>>>>>>>>>                                         turned off.
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>                                         2014-08-05 5:39
| >>>>>>>>>>                                         GMT+03:00 Pranith
| >>>>>>>>>>                                         Kumar Karampuri
| >>>>>>>>>>                                         <pkarampu@xxxxxxxxxx
| >>>>>>>>>>                                         <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>>>>
| >>>>>>>>>>                                             You didn't
| >>>>>>>>>>                                             mention anything
| >>>>>>>>>>                                             about
| >>>>>>>>>>                                             self-healing. Did
| >>>>>>>>>>                                             you wait until
| >>>>>>>>>>                                             the self-heal is
| >>>>>>>>>>                                             complete?
| >>>>>>>>>>
| >>>>>>>>>>                                             Pranith
| >>>>>>>>>>
| >>>>>>>>>>                                             On 08/04/2014
| >>>>>>>>>>                                             05:49 PM, Roman
| >>>>>>>>>>                                             wrote:
| >>>>>>>>>>>                                             Hi!
| >>>>>>>>>>>                                             Result is pretty
| >>>>>>>>>>>                                             same. I set the
| >>>>>>>>>>>                                             switch port down
| >>>>>>>>>>>                                             for 1st server,
| >>>>>>>>>>>                                             it was ok. Then
| >>>>>>>>>>>                                             set it up back
| >>>>>>>>>>>                                             and set other
| >>>>>>>>>>>                                             server's port
| >>>>>>>>>>>                                             off. and it
| >>>>>>>>>>>                                             triggered IO
| >>>>>>>>>>>                                             error on two
| >>>>>>>>>>>                                             virtual
| >>>>>>>>>>>                                             machines: one
| >>>>>>>>>>>                                             with local root
| >>>>>>>>>>>                                             FS but network
| >>>>>>>>>>>                                             mounted storage.
| >>>>>>>>>>>                                             and other with
| >>>>>>>>>>>                                             network root FS.
| >>>>>>>>>>>                                             1st gave an
| >>>>>>>>>>>                                             error on copying
| >>>>>>>>>>>                                             to or from the
| >>>>>>>>>>>                                             mounted network
| >>>>>>>>>>>                                             disk, other just
| >>>>>>>>>>>                                             gave me an error
| >>>>>>>>>>>                                             for even reading
| >>>>>>>>>>>                                             log.files.
| >>>>>>>>>>>
| >>>>>>>>>>>                                             cat:
| >>>>>>>>>>>                                             /var/log/alternatives.log:
| >>>>>>>>>>>                                             Input/output error
| >>>>>>>>>>>                                             then I reset the
| >>>>>>>>>>>                                             kvm VM and it
| >>>>>>>>>>>                                             said me, there
| >>>>>>>>>>>                                             is no boot
| >>>>>>>>>>>                                             device. Next I
| >>>>>>>>>>>                                             virtually
| >>>>>>>>>>>                                             powered it off
| >>>>>>>>>>>                                             and then back on
| >>>>>>>>>>>                                             and it has booted.
| >>>>>>>>>>>
| >>>>>>>>>>>                                             By the way, did
| >>>>>>>>>>>                                             I have to
| >>>>>>>>>>>                                             start/stop volume?
| >>>>>>>>>>>
| >>>>>>>>>>>                                             >> Could you do
| >>>>>>>>>>>                                             the following
| >>>>>>>>>>>                                             and test it again?
| >>>>>>>>>>>                                             >> gluster volume
| >>>>>>>>>>>                                             set <volname>
| >>>>>>>>>>>                                             cluster.self-heal-daemon
| >>>>>>>>>>>                                             off
| >>>>>>>>>>>
| >>>>>>>>>>>                                             >>Pranith
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>                                             2014-08-04 14:10
| >>>>>>>>>>>                                             GMT+03:00
| >>>>>>>>>>>                                             Pranith Kumar
| >>>>>>>>>>>                                             Karampuri
| >>>>>>>>>>>                                             <pkarampu@xxxxxxxxxx
| >>>>>>>>>>>                                             <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>                                                 On
| >>>>>>>>>>>                                                 08/04/2014
| >>>>>>>>>>>                                                 03:33 PM,
| >>>>>>>>>>>                                                 Roman wrote:
| >>>>>>>>>>>>                                                 Hello!
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 Facing the
| >>>>>>>>>>>>                                                 same
| >>>>>>>>>>>>                                                 problem as
| >>>>>>>>>>>>                                                 mentioned
| >>>>>>>>>>>>                                                 here:
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 my set up
| >>>>>>>>>>>>                                                 is up and
| >>>>>>>>>>>>                                                 running, so
| >>>>>>>>>>>>                                                 i'm ready
| >>>>>>>>>>>>                                                 to help you
| >>>>>>>>>>>>                                                 back with
| >>>>>>>>>>>>                                                 feedback.
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 setup:
| >>>>>>>>>>>>                                                 proxmox
| >>>>>>>>>>>>                                                 server as
| >>>>>>>>>>>>                                                 client
| >>>>>>>>>>>>                                                 2 gluster
| >>>>>>>>>>>>                                                 physical
| >>>>>>>>>>>>                                                  servers
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 server side
| >>>>>>>>>>>>                                                 and client
| >>>>>>>>>>>>                                                 side both
| >>>>>>>>>>>>                                                 running atm
| >>>>>>>>>>>>                                                 3.4.4
| >>>>>>>>>>>>                                                 glusterfs
| >>>>>>>>>>>>                                                 from
| >>>>>>>>>>>>                                                 gluster repo.
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 the problem is:
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 1. craeted
| >>>>>>>>>>>>                                                 replica bricks.
| >>>>>>>>>>>>                                                 2. mounted
| >>>>>>>>>>>>                                                 in proxmox
| >>>>>>>>>>>>                                                 (tried both
| >>>>>>>>>>>>                                                 promox
| >>>>>>>>>>>>                                                 ways: via
| >>>>>>>>>>>>                                                 GUI and
| >>>>>>>>>>>>                                                 fstab (with
| >>>>>>>>>>>>                                                 backup
| >>>>>>>>>>>>                                                 volume
| >>>>>>>>>>>>                                                 line), btw
| >>>>>>>>>>>>                                                 while
| >>>>>>>>>>>>                                                 mounting
| >>>>>>>>>>>>                                                 via fstab
| >>>>>>>>>>>>                                                 I'm unable
| >>>>>>>>>>>>                                                 to launch a
| >>>>>>>>>>>>                                                 VM without
| >>>>>>>>>>>>                                                 cache,
| >>>>>>>>>>>>                                                 meanwhile
| >>>>>>>>>>>>                                                 direct-io-mode
| >>>>>>>>>>>>                                                 is enabled
| >>>>>>>>>>>>                                                 in fstab line)
| >>>>>>>>>>>>                                                 3. installed VM
| >>>>>>>>>>>>                                                 4. bring
| >>>>>>>>>>>>                                                 one volume
| >>>>>>>>>>>>                                                 down - ok
| >>>>>>>>>>>>                                                 5. bringing
| >>>>>>>>>>>>                                                 up, waiting
| >>>>>>>>>>>>                                                 for sync is
| >>>>>>>>>>>>                                                 done.
| >>>>>>>>>>>>                                                 6. bring
| >>>>>>>>>>>>                                                 other
| >>>>>>>>>>>>                                                 volume down
| >>>>>>>>>>>>                                                 - getting
| >>>>>>>>>>>>                                                 IO errors
| >>>>>>>>>>>>                                                 on VM guest
| >>>>>>>>>>>>                                                 and not
| >>>>>>>>>>>>                                                 able to
| >>>>>>>>>>>>                                                 restore the
| >>>>>>>>>>>>                                                 VM after I
| >>>>>>>>>>>>                                                 reset the
| >>>>>>>>>>>>                                                 VM via
| >>>>>>>>>>>>                                                 host. It
| >>>>>>>>>>>>                                                 says (no
| >>>>>>>>>>>>                                                 bootable
| >>>>>>>>>>>>                                                 media).
| >>>>>>>>>>>>                                                 After I
| >>>>>>>>>>>>                                                 shut it
| >>>>>>>>>>>>                                                 down
| >>>>>>>>>>>>                                                 (forced)
| >>>>>>>>>>>>                                                 and bring
| >>>>>>>>>>>>                                                 back up, it
| >>>>>>>>>>>>                                                 boots.
| >>>>>>>>>>>                                                 Could you do
| >>>>>>>>>>>                                                 the
| >>>>>>>>>>>                                                 following
| >>>>>>>>>>>                                                 and test it
| >>>>>>>>>>>                                                 again?
| >>>>>>>>>>>                                                 gluster
| >>>>>>>>>>>                                                 volume set
| >>>>>>>>>>>                                                 <volname>
| >>>>>>>>>>>                                                 cluster.self-heal-daemon
| >>>>>>>>>>>                                                 off
| >>>>>>>>>>>
| >>>>>>>>>>>                                                 Pranith
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 Need help.
| >>>>>>>>>>>>                                                 Tried
| >>>>>>>>>>>>                                                 3.4.3, 3.4.4.
| >>>>>>>>>>>>                                                 Still
| >>>>>>>>>>>>                                                 missing
| >>>>>>>>>>>>                                                 pkg-s for
| >>>>>>>>>>>>                                                 3.4.5 for
| >>>>>>>>>>>>                                                 debian and
| >>>>>>>>>>>>                                                 3.5.2
| >>>>>>>>>>>>                                                 (3.5.1
| >>>>>>>>>>>>                                                 always
| >>>>>>>>>>>>                                                 gives a
| >>>>>>>>>>>>                                                 healing
| >>>>>>>>>>>>                                                 error for
| >>>>>>>>>>>>                                                 some reason)
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 --
| >>>>>>>>>>>>                                                 Best regards,
| >>>>>>>>>>>>                                                 Roman.
| >>>>>>>>>>>>
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 _______________________________________________
| >>>>>>>>>>>>                                                 Gluster-users
| >>>>>>>>>>>>                                                 mailing list
| >>>>>>>>>>>>                                                 Gluster-users@xxxxxxxxxxx
| >>>>>>>>>>>>                                                 <mailto:Gluster-users@xxxxxxxxxxx>
| >>>>>>>>>>>>                                                 http://supercolony.gluster.org/mailman/listinfo/gluster-users
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>                                             --
| >>>>>>>>>>>                                             Best regards,
| >>>>>>>>>>>                                             Roman.
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>                                         --
| >>>>>>>>>>                                         Best regards,
| >>>>>>>>>>                                         Roman.
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>                                     --
| >>>>>>>>>                                     Best regards,
| >>>>>>>>>                                     Roman.
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>                                 --
| >>>>>>>>                                 Best regards,
| >>>>>>>>                                 Roman.
| >>>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>>                             --
| >>>>>>>                             Best regards,
| >>>>>>>                             Roman.
| >>>>>>
| >>>>>>
| >>>>>>
| >>>>>>
| >>>>>>                         --
| >>>>>>                         Best regards,
| >>>>>>                         Roman.
| >>>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>>                     --
| >>>>>                     Best regards,
| >>>>>                     Roman.
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>                 --
| >>>>                 Best regards,
| >>>>                 Roman.
| >>>
| >>>
| >>>
| >>>
| >>>             --
| >>>             Best regards,
| >>>             Roman.
| >>
| >>
| >>
| >>
| >>         --
| >>         Best regards,
| >>         Roman.
| >>
| >>
| >>
| >>
| >>     --
| >>     Best regards,
| >>     Roman.
| >
| >
| >
| >
| > --
| > Best regards,
| > Roman.
|
|



--
Best regards,
Roman.




--
Best regards,
Roman.



--
Best regards,
Roman.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux