Re: libgfapi failover problem on replica bricks

Humble Chirammal <hchiramm@xxxxxxxxxx> · Wed, 6 Aug 2014 05:49:12 -0400 (EDT)

----- Original Message -----
| From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
| To: "Roman" <romeo.r@xxxxxxxxx>
| Cc: gluster-users@xxxxxxxxxxx, "Niels de Vos" <ndevos@xxxxxxxxxx>, "Humble Chirammal" <hchiramm@xxxxxxxxxx>
| Sent: Wednesday, August 6, 2014 12:09:57 PM
| Subject: Re:  libgfapi failover problem on replica bricks
| 
| Roman,
|      The file went into split-brain. I think we should do these tests
| with 3.5.2. Where monitoring the heals is easier. Let me also come up
| with a document about how to do this testing you are trying to do.
| 
| Humble/Niels,
|      Do we have debs available for 3.5.2? In 3.5.1 there was packaging
| issue where /usr/bin/glfsheal is not packaged along with the deb. I
| think that should be fixed now as well?
| 
Pranith,

The 3.5.2 packages for debian is not available yet. We are co-ordinating internally to get it processed.
I will update the list once its available.

--Humble
| 
| On 08/06/2014 11:52 AM, Roman wrote:
| > good morning,
| >
| > root@stor1:~# getfattr -d -m. -e hex
| > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > getfattr: Removing leading '/' from absolute path names
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
| > trusted.gfid=0x23c79523075a4158bea38078da570449
| >
| > getfattr: Removing leading '/' from absolute path names
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| > trusted.gfid=0x23c79523075a4158bea38078da570449
| >
| >
| >
| > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx
| > <mailto:pkarampu@xxxxxxxxxx>>:
| >
| >
| >     On 08/06/2014 11:30 AM, Roman wrote:
| >>     Also, this time files are not the same!
| >>
| >>     root@stor1:~# md5sum
| >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>     32411360c53116b96a059f17306caeda
| >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>
| >>     root@stor2:~# md5sum
| >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>     65b8a6031bcb6f5fb3a11cb1e8b1c9c9
| >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >     What is the getfattr output?
| >
| >     Pranith
| >
| >>
| >>
| >>     2014-08-05 16:33 GMT+03:00 Roman <romeo.r@xxxxxxxxx
| >>     <mailto:romeo.r@xxxxxxxxx>>:
| >>
| >>         Nope, it is not working. But this time it went a bit other way
| >>
| >>         root@gluster-client:~# dmesg
| >>         Segmentation fault
| >>
| >>
| >>         I was not able even to start the VM after I done the tests
| >>
| >>         Could not read qcow2 header: Operation not permitted
| >>
| >>         And it seems, it never starts to sync files after first
| >>         disconnect. VM survives first disconnect, but not second (I
| >>         waited around 30 minutes). Also, I've
| >>         got network.ping-timeout: 2 in volume settings, but logs
| >>         react on first disconnect around 30 seconds. Second was
| >>         faster, 2 seconds.
| >>
| >>         Reaction was different also:
| >>
| >>         slower one:
| >>         [2014-08-05 13:26:19.558435] W [socket.c:514:__socket_rwv]
| >>         0-glusterfs: readv failed (Connection timed out)
| >>         [2014-08-05 13:26:19.558485] W
| >>         [socket.c:1962:__socket_proto_state_machine] 0-glusterfs:
| >>         reading from socket failed. Error (Connection timed out),
| >>         peer (10.250.0.1:24007 <http://10.250.0.1:24007>)
| >>         [2014-08-05 13:26:21.281426] W [socket.c:514:__socket_rwv]
| >>         0-HA-fast-150G-PVE1-client-0: readv failed (Connection timed out)
| >>         [2014-08-05 13:26:21.281474] W
| >>         [socket.c:1962:__socket_proto_state_machine]
| >>         0-HA-fast-150G-PVE1-client-0: reading from socket failed.
| >>         Error (Connection timed out), peer (10.250.0.1:49153
| >>         <http://10.250.0.1:49153>)
| >>         [2014-08-05 13:26:21.281507] I
| >>         [client.c:2098:client_rpc_notify]
| >>         0-HA-fast-150G-PVE1-client-0: disconnected
| >>
| >>         the fast one:
| >>         2014-08-05 12:52:44.607389] C
| >>         [client-handshake.c:127:rpc_client_ping_timer_expired]
| >>         0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
| >>         <http://10.250.0.2:49153> has not responded in the last 2
| >>         seconds, disconnecting.
| >>         [2014-08-05 12:52:44.607491] W [socket.c:514:__socket_rwv]
| >>         0-HA-fast-150G-PVE1-client-1: readv failed (No data available)
| >>         [2014-08-05 12:52:44.607585] E
| >>         [rpc-clnt.c:368:saved_frames_unwind]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
| >>         [0x7fcb1b4b0558]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
| >>         [0x7fcb1b4aea63]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
| >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
| >>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
| >>         2014-08-05 12:52:42.463881 (xid=0x381883x)
| >>         [2014-08-05 12:52:44.607604] W
| >>         [client-rpc-fops.c:2624:client3_3_lookup_cbk]
| >>         0-HA-fast-150G-PVE1-client-1: remote operation failed:
| >>         Transport endpoint is not connected. Path: /
| >>         (00000000-0000-0000-0000-000000000001)
| >>         [2014-08-05 12:52:44.607736] E
| >>         [rpc-clnt.c:368:saved_frames_unwind]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
| >>         [0x7fcb1b4b0558]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
| >>         [0x7fcb1b4aea63]
| >>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
| >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
| >>         unwinding frame type(GlusterFS Handshake) op(PING(3)) called
| >>         at 2014-08-05 12:52:42.463891 (xid=0x381884x)
| >>         [2014-08-05 12:52:44.607753] W
| >>         [client-handshake.c:276:client_ping_cbk]
| >>         0-HA-fast-150G-PVE1-client-1: timer must have expired
| >>         [2014-08-05 12:52:44.607776] I
| >>         [client.c:2098:client_rpc_notify]
| >>         0-HA-fast-150G-PVE1-client-1: disconnected
| >>
| >>
| >>
| >>         I've got SSD disks (just for an info).
| >>         Should I go and give a try for 3.5.2?
| >>
| >>
| >>
| >>         2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
| >>         <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >>
| >>             reply along with gluster-users please :-). May be you are
| >>             hitting 'reply' instead of 'reply all'?
| >>
| >>             Pranith
| >>
| >>             On 08/05/2014 03:35 PM, Roman wrote:
| >>>             To make sure and clean, I've created another VM with raw
| >>>             format and goint to repeat those steps. So now I've got
| >>>             two VM-s one with qcow2 format and other with raw
| >>>             format. I will send another e-mail shortly.
| >>>
| >>>
| >>>             2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
| >>>             <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >>>
| >>>
| >>>                 On 08/05/2014 03:07 PM, Roman wrote:
| >>>>                 really, seems like the same file
| >>>>
| >>>>                 stor1:
| >>>>                 a951641c5230472929836f9fcede6b04
| >>>>                  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>
| >>>>                 stor2:
| >>>>                 a951641c5230472929836f9fcede6b04
| >>>>                  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>
| >>>>
| >>>>                 one thing I've seen from logs, that somehow proxmox
| >>>>                 VE is connecting with wrong version to servers?
| >>>>                 [2014-08-05 09:23:45.218550] I
| >>>>                 [client-handshake.c:1659:select_server_supported_programs]
| >>>>                 0-HA-fast-150G-PVE1-client-0: Using Program
| >>>>                 GlusterFS 3.3, Num (1298437), Version (330)
| >>>                 It is the rpc (over the network data structures)
| >>>                 version, which is not changed at all from 3.3 so
| >>>                 thats not a problem. So what is the conclusion? Is
| >>>                 your test case working now or not?
| >>>
| >>>                 Pranith
| >>>
| >>>>                 but if I issue:
| >>>>                 root@pve1:~# glusterfs -V
| >>>>                 glusterfs 3.4.4 built on Jun 28 2014 03:44:57
| >>>>                 seems ok.
| >>>>
| >>>>                 server  use 3.4.4 meanwhile
| >>>>                 [2014-08-05 09:23:45.117875] I
| >>>>                 [server-handshake.c:567:server_setvolume]
| >>>>                 0-HA-fast-150G-PVE1-server: accepted client from
| >>>>                 stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
| >>>>                 (version: 3.4.4)
| >>>>                 [2014-08-05 09:23:49.103035] I
| >>>>                 [server-handshake.c:567:server_setvolume]
| >>>>                 0-HA-fast-150G-PVE1-server: accepted client from
| >>>>                 stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
| >>>>                 (version: 3.4.4)
| >>>>
| >>>>                 if this could be the reason, of course.
| >>>>                 I did restart the Proxmox VE yesterday (just for an
| >>>>                 information)
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>                 2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri
| >>>>                 <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>
| >>>>
| >>>>                     On 08/05/2014 02:33 PM, Roman wrote:
| >>>>>                     Waited long enough for now, still different
| >>>>>                     sizes and no logs about healing :(
| >>>>>
| >>>>>                     stor1
| >>>>>                     # file:
| >>>>>                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| >>>>>                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
| >>>>>
| >>>>>                     root@stor1:~# du -sh
| >>>>>                     /exports/fast-test/150G/images/127/
| >>>>>                     1.2G  /exports/fast-test/150G/images/127/
| >>>>>
| >>>>>
| >>>>>                     stor2
| >>>>>                     # file:
| >>>>>                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| >>>>>                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| >>>>>                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
| >>>>>
| >>>>>
| >>>>>                     root@stor2:~# du -sh
| >>>>>                     /exports/fast-test/150G/images/127/
| >>>>>                     1.4G  /exports/fast-test/150G/images/127/
| >>>>                     According to the changelogs, the file doesn't
| >>>>                     need any healing. Could you stop the operations
| >>>>                     on the VMs and take md5sum on both these machines?
| >>>>
| >>>>                     Pranith
| >>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>>                     2014-08-05 11:49 GMT+03:00 Pranith Kumar
| >>>>>                     Karampuri <pkarampu@xxxxxxxxxx
| >>>>>                     <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>
| >>>>>
| >>>>>                         On 08/05/2014 02:06 PM, Roman wrote:
| >>>>>>                         Well, it seems like it doesn't see the
| >>>>>>                         changes were made to the volume ? I
| >>>>>>                         created two files 200 and 100 MB (from
| >>>>>>                         /dev/zero) after I disconnected the first
| >>>>>>                         brick. Then connected it back and got
| >>>>>>                         these logs:
| >>>>>>
| >>>>>>                         [2014-08-05 08:30:37.830150] I
| >>>>>>                         [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
| >>>>>>                         0-glusterfs: No change in volfile, continuing
| >>>>>>                         [2014-08-05 08:30:37.830207] I
| >>>>>>                         [rpc-clnt.c:1676:rpc_clnt_reconfig]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: changing
| >>>>>>                         port to 49153 (from 0)
| >>>>>>                         [2014-08-05 08:30:37.830239] W
| >>>>>>                         [socket.c:514:__socket_rwv]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: readv
| >>>>>>                         failed (No data available)
| >>>>>>                         [2014-08-05 08:30:37.831024] I
| >>>>>>                         [client-handshake.c:1659:select_server_supported_programs]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Using
| >>>>>>                         Program GlusterFS 3.3, Num (1298437),
| >>>>>>                         Version (330)
| >>>>>>                         [2014-08-05 08:30:37.831375] I
| >>>>>>                         [client-handshake.c:1456:client_setvolume_cbk]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Connected
| >>>>>>                         to 10.250.0.1:49153
| >>>>>>                         <http://10.250.0.1:49153>, attached to
| >>>>>>                         remote volume '/exports/fast-test/150G'.
| >>>>>>                         [2014-08-05 08:30:37.831394] I
| >>>>>>                         [client-handshake.c:1468:client_setvolume_cbk]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Server and
| >>>>>>                         Client lk-version numbers are not same,
| >>>>>>                         reopening the fds
| >>>>>>                         [2014-08-05 08:30:37.831566] I
| >>>>>>                         [client-handshake.c:450:client_set_lk_version_cbk]
| >>>>>>                         0-HA-fast-150G-PVE1-client-0: Server lk
| >>>>>>                         version = 1
| >>>>>>
| >>>>>>
| >>>>>>                         [2014-08-05 08:30:37.830150] I
| >>>>>>                         [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
| >>>>>>                         0-glusterfs: No change in volfile, continuing
| >>>>>>                         this line seems weird to me tbh.
| >>>>>>                         I do not see any traffic on switch
| >>>>>>                         interfaces between gluster servers, which
| >>>>>>                         means, there is no syncing between them.
| >>>>>>                         I tried to ls -l the files on the client
| >>>>>>                         and servers to trigger the healing, but
| >>>>>>                         seems like no success. Should I wait more?
| >>>>>                         Yes, it should take around 10-15 minutes.
| >>>>>                         Could you provide 'getfattr -d -m. -e hex
| >>>>>                         <file-on-brick>' on both the bricks.
| >>>>>
| >>>>>                         Pranith
| >>>>>
| >>>>>>
| >>>>>>
| >>>>>>                         2014-08-05 11:25 GMT+03:00 Pranith Kumar
| >>>>>>                         Karampuri <pkarampu@xxxxxxxxxx
| >>>>>>                         <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>
| >>>>>>
| >>>>>>                             On 08/05/2014 01:10 PM, Roman wrote:
| >>>>>>>                             Ahha! For some reason I was not able
| >>>>>>>                             to start the VM anymore, Proxmox VE
| >>>>>>>                             told me, that it is not able to read
| >>>>>>>                             the qcow2 header due to permission
| >>>>>>>                             is denied for some reason. So I just
| >>>>>>>                             deleted that file and created a new
| >>>>>>>                             VM. And the nex message I've got was
| >>>>>>>                             this:
| >>>>>>                             Seems like these are the messages
| >>>>>>                             where you took down the bricks before
| >>>>>>                             self-heal. Could you restart the run
| >>>>>>                             waiting for self-heals to complete
| >>>>>>                             before taking down the next brick?
| >>>>>>
| >>>>>>                             Pranith
| >>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>>                             [2014-08-05 07:31:25.663412] E
| >>>>>>>                             [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
| >>>>>>>                             0-HA-fast-150G-PVE1-replicate-0:
| >>>>>>>                             Unable to self-heal contents of
| >>>>>>>                             '/images/124/vm-124-disk-1.qcow2'
| >>>>>>>                             (possible split-brain). Please
| >>>>>>>                             delete the file from all but the
| >>>>>>>                             preferred subvolume.- Pending
| >>>>>>>                             matrix:  [ [ 0 60 ] [ 11 0 ] ]
| >>>>>>>                             [2014-08-05 07:31:25.663955] E
| >>>>>>>                             [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
| >>>>>>>                             0-HA-fast-150G-PVE1-replicate-0:
| >>>>>>>                             background  data self-heal failed on
| >>>>>>>                             /images/124/vm-124-disk-1.qcow2
| >>>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>>                             2014-08-05 10:13 GMT+03:00 Pranith
| >>>>>>>                             Kumar Karampuri <pkarampu@xxxxxxxxxx
| >>>>>>>                             <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>
| >>>>>>>                                 I just responded to your earlier
| >>>>>>>                                 mail about how the log looks.
| >>>>>>>                                 The log comes on the mount's logfile
| >>>>>>>
| >>>>>>>                                 Pranith
| >>>>>>>
| >>>>>>>                                 On 08/05/2014 12:41 PM, Roman wrote:
| >>>>>>>>                                 Ok, so I've waited enough, I
| >>>>>>>>                                 think. Had no any traffic on
| >>>>>>>>                                 switch ports between servers.
| >>>>>>>>                                 Could not find any suitable log
| >>>>>>>>                                 message about completed
| >>>>>>>>                                 self-heal (waited about 30
| >>>>>>>>                                 minutes). Plugged out the other
| >>>>>>>>                                 server's UTP cable this time
| >>>>>>>>                                 and got in the same situation:
| >>>>>>>>                                 root@gluster-test1:~# cat
| >>>>>>>>                                 /var/log/dmesg
| >>>>>>>>                                 -bash: /bin/cat: Input/output error
| >>>>>>>>
| >>>>>>>>                                 brick logs:
| >>>>>>>>                                 [2014-08-05 07:09:03.005474] I
| >>>>>>>>                                 [server.c:762:server_rpc_notify]
| >>>>>>>>                                 0-HA-fast-150G-PVE1-server:
| >>>>>>>>                                 disconnecting connectionfrom
| >>>>>>>>                                 pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
| >>>>>>>>                                 [2014-08-05 07:09:03.005530] I
| >>>>>>>>                                 [server-helpers.c:729:server_connection_put]
| >>>>>>>>                                 0-HA-fast-150G-PVE1-server:
| >>>>>>>>                                 Shutting down connection
| >>>>>>>>                                 pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
| >>>>>>>>                                 [2014-08-05 07:09:03.005560] I
| >>>>>>>>                                 [server-helpers.c:463:do_fd_cleanup]
| >>>>>>>>                                 0-HA-fast-150G-PVE1-server: fd
| >>>>>>>>                                 cleanup on
| >>>>>>>>                                 /images/124/vm-124-disk-1.qcow2
| >>>>>>>>                                 [2014-08-05 07:09:03.005797] I
| >>>>>>>>                                 [server-helpers.c:617:server_connection_destroy]
| >>>>>>>>                                 0-HA-fast-150G-PVE1-server:
| >>>>>>>>                                 destroyed connection of
| >>>>>>>>                                 pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>                                 2014-08-05 9:53 GMT+03:00
| >>>>>>>>                                 Pranith Kumar Karampuri
| >>>>>>>>                                 <pkarampu@xxxxxxxxxx
| >>>>>>>>                                 <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>>
| >>>>>>>>                                     Do you think it is possible
| >>>>>>>>                                     for you to do these tests
| >>>>>>>>                                     on the latest version
| >>>>>>>>                                     3.5.2? 'gluster volume heal
| >>>>>>>>                                     <volname> info' would give
| >>>>>>>>                                     you that information in
| >>>>>>>>                                     versions > 3.5.1.
| >>>>>>>>                                     Otherwise you will have to
| >>>>>>>>                                     check it from either the
| >>>>>>>>                                     logs, there will be
| >>>>>>>>                                     self-heal completed message
| >>>>>>>>                                     on the mount logs (or) by
| >>>>>>>>                                     observing 'getfattr -d -m.
| >>>>>>>>                                     -e hex <image-file-on-bricks>'
| >>>>>>>>
| >>>>>>>>                                     Pranith
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>                                     On 08/05/2014 12:09 PM,
| >>>>>>>>                                     Roman wrote:
| >>>>>>>>>                                     Ok, I understand. I will
| >>>>>>>>>                                     try this shortly.
| >>>>>>>>>                                     How can I be sure, that
| >>>>>>>>>                                     healing process is done,
| >>>>>>>>>                                     if I am not able to see
| >>>>>>>>>                                     its status?
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>                                     2014-08-05 9:30 GMT+03:00
| >>>>>>>>>                                     Pranith Kumar Karampuri
| >>>>>>>>>                                     <pkarampu@xxxxxxxxxx
| >>>>>>>>>                                     <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>>>
| >>>>>>>>>                                         Mounts will do the
| >>>>>>>>>                                         healing, not the
| >>>>>>>>>                                         self-heal-daemon. The
| >>>>>>>>>                                         problem I feel is that
| >>>>>>>>>                                         whichever process does
| >>>>>>>>>                                         the healing has the
| >>>>>>>>>                                         latest information
| >>>>>>>>>                                         about the good bricks
| >>>>>>>>>                                         in this usecase. Since
| >>>>>>>>>                                         for VM usecase, mounts
| >>>>>>>>>                                         should have the latest
| >>>>>>>>>                                         information, we should
| >>>>>>>>>                                         let the mounts do the
| >>>>>>>>>                                         healing. If the mount
| >>>>>>>>>                                         accesses the VM image
| >>>>>>>>>                                         either by someone
| >>>>>>>>>                                         doing operations
| >>>>>>>>>                                         inside the VM or
| >>>>>>>>>                                         explicit stat on the
| >>>>>>>>>                                         file it should do the
| >>>>>>>>>                                         healing.
| >>>>>>>>>
| >>>>>>>>>                                         Pranith.
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>                                         On 08/05/2014 10:39
| >>>>>>>>>                                         AM, Roman wrote:
| >>>>>>>>>>                                         Hmmm, you told me to
| >>>>>>>>>>                                         turn it off. Did I
| >>>>>>>>>>                                         understood something
| >>>>>>>>>>                                         wrong? After I issued
| >>>>>>>>>>                                         the command you've
| >>>>>>>>>>                                         sent me, I was not
| >>>>>>>>>>                                         able to watch the
| >>>>>>>>>>                                         healing process, it
| >>>>>>>>>>                                         said, it won't be
| >>>>>>>>>>                                         healed, becouse its
| >>>>>>>>>>                                         turned off.
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>                                         2014-08-05 5:39
| >>>>>>>>>>                                         GMT+03:00 Pranith
| >>>>>>>>>>                                         Kumar Karampuri
| >>>>>>>>>>                                         <pkarampu@xxxxxxxxxx
| >>>>>>>>>>                                         <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>>>>
| >>>>>>>>>>                                             You didn't
| >>>>>>>>>>                                             mention anything
| >>>>>>>>>>                                             about
| >>>>>>>>>>                                             self-healing. Did
| >>>>>>>>>>                                             you wait until
| >>>>>>>>>>                                             the self-heal is
| >>>>>>>>>>                                             complete?
| >>>>>>>>>>
| >>>>>>>>>>                                             Pranith
| >>>>>>>>>>
| >>>>>>>>>>                                             On 08/04/2014
| >>>>>>>>>>                                             05:49 PM, Roman
| >>>>>>>>>>                                             wrote:
| >>>>>>>>>>>                                             Hi!
| >>>>>>>>>>>                                             Result is pretty
| >>>>>>>>>>>                                             same. I set the
| >>>>>>>>>>>                                             switch port down
| >>>>>>>>>>>                                             for 1st server,
| >>>>>>>>>>>                                             it was ok. Then
| >>>>>>>>>>>                                             set it up back
| >>>>>>>>>>>                                             and set other
| >>>>>>>>>>>                                             server's port
| >>>>>>>>>>>                                             off. and it
| >>>>>>>>>>>                                             triggered IO
| >>>>>>>>>>>                                             error on two
| >>>>>>>>>>>                                             virtual
| >>>>>>>>>>>                                             machines: one
| >>>>>>>>>>>                                             with local root
| >>>>>>>>>>>                                             FS but network
| >>>>>>>>>>>                                             mounted storage.
| >>>>>>>>>>>                                             and other with
| >>>>>>>>>>>                                             network root FS.
| >>>>>>>>>>>                                             1st gave an
| >>>>>>>>>>>                                             error on copying
| >>>>>>>>>>>                                             to or from the
| >>>>>>>>>>>                                             mounted network
| >>>>>>>>>>>                                             disk, other just
| >>>>>>>>>>>                                             gave me an error
| >>>>>>>>>>>                                             for even reading
| >>>>>>>>>>>                                             log.files.
| >>>>>>>>>>>
| >>>>>>>>>>>                                             cat:
| >>>>>>>>>>>                                             /var/log/alternatives.log:
| >>>>>>>>>>>                                             Input/output error
| >>>>>>>>>>>                                             then I reset the
| >>>>>>>>>>>                                             kvm VM and it
| >>>>>>>>>>>                                             said me, there
| >>>>>>>>>>>                                             is no boot
| >>>>>>>>>>>                                             device. Next I
| >>>>>>>>>>>                                             virtually
| >>>>>>>>>>>                                             powered it off
| >>>>>>>>>>>                                             and then back on
| >>>>>>>>>>>                                             and it has booted.
| >>>>>>>>>>>
| >>>>>>>>>>>                                             By the way, did
| >>>>>>>>>>>                                             I have to
| >>>>>>>>>>>                                             start/stop volume?
| >>>>>>>>>>>
| >>>>>>>>>>>                                             >> Could you do
| >>>>>>>>>>>                                             the following
| >>>>>>>>>>>                                             and test it again?
| >>>>>>>>>>>                                             >> gluster volume
| >>>>>>>>>>>                                             set <volname>
| >>>>>>>>>>>                                             cluster.self-heal-daemon
| >>>>>>>>>>>                                             off
| >>>>>>>>>>>
| >>>>>>>>>>>                                             >>Pranith
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>                                             2014-08-04 14:10
| >>>>>>>>>>>                                             GMT+03:00
| >>>>>>>>>>>                                             Pranith Kumar
| >>>>>>>>>>>                                             Karampuri
| >>>>>>>>>>>                                             <pkarampu@xxxxxxxxxx
| >>>>>>>>>>>                                             <mailto:pkarampu@xxxxxxxxxx>>:
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>                                                 On
| >>>>>>>>>>>                                                 08/04/2014
| >>>>>>>>>>>                                                 03:33 PM,
| >>>>>>>>>>>                                                 Roman wrote:
| >>>>>>>>>>>>                                                 Hello!
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 Facing the
| >>>>>>>>>>>>                                                 same
| >>>>>>>>>>>>                                                 problem as
| >>>>>>>>>>>>                                                 mentioned
| >>>>>>>>>>>>                                                 here:
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 my set up
| >>>>>>>>>>>>                                                 is up and
| >>>>>>>>>>>>                                                 running, so
| >>>>>>>>>>>>                                                 i'm ready
| >>>>>>>>>>>>                                                 to help you
| >>>>>>>>>>>>                                                 back with
| >>>>>>>>>>>>                                                 feedback.
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 setup:
| >>>>>>>>>>>>                                                 proxmox
| >>>>>>>>>>>>                                                 server as
| >>>>>>>>>>>>                                                 client
| >>>>>>>>>>>>                                                 2 gluster
| >>>>>>>>>>>>                                                 physical
| >>>>>>>>>>>>                                                  servers
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 server side
| >>>>>>>>>>>>                                                 and client
| >>>>>>>>>>>>                                                 side both
| >>>>>>>>>>>>                                                 running atm
| >>>>>>>>>>>>                                                 3.4.4
| >>>>>>>>>>>>                                                 glusterfs
| >>>>>>>>>>>>                                                 from
| >>>>>>>>>>>>                                                 gluster repo.
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 the problem is:
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 1. craeted
| >>>>>>>>>>>>                                                 replica bricks.
| >>>>>>>>>>>>                                                 2. mounted
| >>>>>>>>>>>>                                                 in proxmox
| >>>>>>>>>>>>                                                 (tried both
| >>>>>>>>>>>>                                                 promox
| >>>>>>>>>>>>                                                 ways: via
| >>>>>>>>>>>>                                                 GUI and
| >>>>>>>>>>>>                                                 fstab (with
| >>>>>>>>>>>>                                                 backup
| >>>>>>>>>>>>                                                 volume
| >>>>>>>>>>>>                                                 line), btw
| >>>>>>>>>>>>                                                 while
| >>>>>>>>>>>>                                                 mounting
| >>>>>>>>>>>>                                                 via fstab
| >>>>>>>>>>>>                                                 I'm unable
| >>>>>>>>>>>>                                                 to launch a
| >>>>>>>>>>>>                                                 VM without
| >>>>>>>>>>>>                                                 cache,
| >>>>>>>>>>>>                                                 meanwhile
| >>>>>>>>>>>>                                                 direct-io-mode
| >>>>>>>>>>>>                                                 is enabled
| >>>>>>>>>>>>                                                 in fstab line)
| >>>>>>>>>>>>                                                 3. installed VM
| >>>>>>>>>>>>                                                 4. bring
| >>>>>>>>>>>>                                                 one volume
| >>>>>>>>>>>>                                                 down - ok
| >>>>>>>>>>>>                                                 5. bringing
| >>>>>>>>>>>>                                                 up, waiting
| >>>>>>>>>>>>                                                 for sync is
| >>>>>>>>>>>>                                                 done.
| >>>>>>>>>>>>                                                 6. bring
| >>>>>>>>>>>>                                                 other
| >>>>>>>>>>>>                                                 volume down
| >>>>>>>>>>>>                                                 - getting
| >>>>>>>>>>>>                                                 IO errors
| >>>>>>>>>>>>                                                 on VM guest
| >>>>>>>>>>>>                                                 and not
| >>>>>>>>>>>>                                                 able to
| >>>>>>>>>>>>                                                 restore the
| >>>>>>>>>>>>                                                 VM after I
| >>>>>>>>>>>>                                                 reset the
| >>>>>>>>>>>>                                                 VM via
| >>>>>>>>>>>>                                                 host. It
| >>>>>>>>>>>>                                                 says (no
| >>>>>>>>>>>>                                                 bootable
| >>>>>>>>>>>>                                                 media).
| >>>>>>>>>>>>                                                 After I
| >>>>>>>>>>>>                                                 shut it
| >>>>>>>>>>>>                                                 down
| >>>>>>>>>>>>                                                 (forced)
| >>>>>>>>>>>>                                                 and bring
| >>>>>>>>>>>>                                                 back up, it
| >>>>>>>>>>>>                                                 boots.
| >>>>>>>>>>>                                                 Could you do
| >>>>>>>>>>>                                                 the
| >>>>>>>>>>>                                                 following
| >>>>>>>>>>>                                                 and test it
| >>>>>>>>>>>                                                 again?
| >>>>>>>>>>>                                                 gluster
| >>>>>>>>>>>                                                 volume set
| >>>>>>>>>>>                                                 <volname>
| >>>>>>>>>>>                                                 cluster.self-heal-daemon
| >>>>>>>>>>>                                                 off
| >>>>>>>>>>>
| >>>>>>>>>>>                                                 Pranith
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 Need help.
| >>>>>>>>>>>>                                                 Tried
| >>>>>>>>>>>>                                                 3.4.3, 3.4.4.
| >>>>>>>>>>>>                                                 Still
| >>>>>>>>>>>>                                                 missing
| >>>>>>>>>>>>                                                 pkg-s for
| >>>>>>>>>>>>                                                 3.4.5 for
| >>>>>>>>>>>>                                                 debian and
| >>>>>>>>>>>>                                                 3.5.2
| >>>>>>>>>>>>                                                 (3.5.1
| >>>>>>>>>>>>                                                 always
| >>>>>>>>>>>>                                                 gives a
| >>>>>>>>>>>>                                                 healing
| >>>>>>>>>>>>                                                 error for
| >>>>>>>>>>>>                                                 some reason)
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 --
| >>>>>>>>>>>>                                                 Best regards,
| >>>>>>>>>>>>                                                 Roman.
| >>>>>>>>>>>>
| >>>>>>>>>>>>
| >>>>>>>>>>>>                                                 _______________________________________________
| >>>>>>>>>>>>                                                 Gluster-users
| >>>>>>>>>>>>                                                 mailing list
| >>>>>>>>>>>>                                                 Gluster-users@xxxxxxxxxxx
| >>>>>>>>>>>>                                                 <mailto:Gluster-users@xxxxxxxxxxx>
| >>>>>>>>>>>>                                                 http://supercolony.gluster.org/mailman/listinfo/gluster-users
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>                                             --
| >>>>>>>>>>>                                             Best regards,
| >>>>>>>>>>>                                             Roman.
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>                                         --
| >>>>>>>>>>                                         Best regards,
| >>>>>>>>>>                                         Roman.
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>                                     --
| >>>>>>>>>                                     Best regards,
| >>>>>>>>>                                     Roman.
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>                                 --
| >>>>>>>>                                 Best regards,
| >>>>>>>>                                 Roman.
| >>>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>>                             --
| >>>>>>>                             Best regards,
| >>>>>>>                             Roman.
| >>>>>>
| >>>>>>
| >>>>>>
| >>>>>>
| >>>>>>                         --
| >>>>>>                         Best regards,
| >>>>>>                         Roman.
| >>>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>>                     --
| >>>>>                     Best regards,
| >>>>>                     Roman.
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>                 --
| >>>>                 Best regards,
| >>>>                 Roman.
| >>>
| >>>
| >>>
| >>>
| >>>             --
| >>>             Best regards,
| >>>             Roman.
| >>
| >>
| >>
| >>
| >>         --
| >>         Best regards,
| >>         Roman.
| >>
| >>
| >>
| >>
| >>     --
| >>     Best regards,
| >>     Roman.
| >
| >
| >
| >
| > --
| > Best regards,
| > Roman.
| 
| 
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users