root@stor1:~# ls -l /usr/sbin/glfsheal
ls: cannot access /usr/sbin/glfsheal: No such file or directory
Seems like not.
2014-08-27 9:50 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:
On 08/27/2014 11:53 AM, Roman wrote:
Can you check if the following binary is present?Okay.so here are first results:
after I disconnected the first server, I've got this:
root@stor2:~# gluster volume heal HA-FAST-PVE1-150G infoVolume heal failed
/usr/sbin/glfsheal
Pranith
but[2014-08-26 11:45:35.315974] I [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-HA-FAST-PVE1-150G-replicate-0: foreground data self heal is successfully completed, data self heal from HA-FAST-PVE1-150G-client-1 to sinks HA-FAST-PVE1-150G-client-0, with 16108814336 bytes on HA-FAST-PVE1-150G-client-0, 16108814336 bytes on HA-FAST-PVE1-150G-client-1, data - Pending matrix: [ [ 0 0 ] [ 348 0 ] ] on <gfid:e3ede9c6-28d6-4755-841a-d8329e42ccc4>
something wrong during upgrade?
I've got two VM-s on different volumes: one with HD on and other with HD off.Both survived the outage and both seemed synced.
but today I've found kind of a bug with log rotation.
logs rotated both on server and client sides, but logs are being written in *.log.1 file :)
/var/log/glusterfs/mnt-pve-HA-MED-PVE1-1T.log.1
/var/log/glusterfs/glustershd.log.1
such behavior came after upgrade.
logrotate.d conf files include the HUP for gluster pid-s.
client:/var/log/glusterfs/*.log {dailyrotate 7delaycompresscompressnotifemptymissingokpostrotate[ ! -f /var/run/glusterd.pid ] || kill -HUP `cat /var/run/glusterd.pid`endscript}
but I'm not able to ls the pid on client side (should it be there?) :(
and servers:/var/log/glusterfs/*.log {dailyrotate 7delaycompresscompressnotifemptymissingokpostrotate[ ! -f /var/run/glusterd.pid ] || kill -HUP `cat /var/run/glusterd.pid`endscript}
/var/log/glusterfs/*/*.log {dailyrotate 7delaycompresscompressnotifemptymissingokcopytruncatepostrotate[ ! -f /var/run/glusterd.pid ] || kill -HUP `cat /var/run/glusterd.pid`endscript}
I do have /var/run/glusterd.pid on server side.
Should I change something? Logrotation seems to be broken.
2014-08-26 9:29 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:
Welcome back :-). If you set it to off, the test case you execute will work(Validate please :-) ). But we need to test it with self-heal-daemon 'on' and fix any bugs if the test case does not work?
On 08/26/2014 11:55 AM, Roman wrote:
Hello all again!I'm back from vacation and I'm pretty happy with 3.5.2 available for wheezy. Thanks! Just made my updates.For 3.5.2 do I still have to set cluster.self-heal-daemon to off?
Pranith.
2014-08-06 12:49 GMT+03:00 Humble Chirammal <hchiramm@xxxxxxxxxx>:
Pranith,
----- Original Message -----
| From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
| To: "Roman" <romeo.r@xxxxxxxxx>
| Cc: gluster-users@xxxxxxxxxxx, "Niels de Vos" <ndevos@xxxxxxxxxx>, "Humble Chirammal" <hchiramm@xxxxxxxxxx>
| Sent: Wednesday, August 6, 2014 12:09:57 PM
| Subject: Re: libgfapi failover problem on replica bricks
|
| Roman,
| The file went into split-brain. I think we should do these tests
| with 3.5.2. Where monitoring the heals is easier. Let me also come up
| with a document about how to do this testing you are trying to do.
|
| Humble/Niels,
| Do we have debs available for 3.5.2? In 3.5.1 there was packaging
| issue where /usr/bin/glfsheal is not packaged along with the deb. I
| think that should be fixed now as well?
|
The 3.5.2 packages for debian is not available yet. We are co-ordinating internally to get it processed.
I will update the list once its available.
--Humble
|| > <mailto:pkarampu@xxxxxxxxxx>>:
| On 08/06/2014 11:52 AM, Roman wrote:
| > good morning,
| >
| > root@stor1:~# getfattr -d -m. -e hex
| > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > getfattr: Removing leading '/' from absolute path names
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
| > trusted.gfid=0x23c79523075a4158bea38078da570449
| >
| > getfattr: Removing leading '/' from absolute path names
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| > trusted.gfid=0x23c79523075a4158bea38078da570449
| >
| >
| >
| > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx
| >| >> <mailto:romeo.r@xxxxxxxxx>>:
| >
| > On 08/06/2014 11:30 AM, Roman wrote:
| >> Also, this time files are not the same!
| >>
| >> root@stor1:~# md5sum
| >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >> 32411360c53116b96a059f17306caeda
| >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>
| >> root@stor2:~# md5sum
| >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >> 65b8a6031bcb6f5fb3a11cb1e8b1c9c9
| >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| > What is the getfattr output?
| >
| > Pranith
| >
| >>
| >>
| >> 2014-08-05 16:33 GMT+03:00 Roman <romeo.r@xxxxxxxxx
| >>| >> peer (10.250.0.1:24007 <http://10.250.0.1:24007>)
| >> Nope, it is not working. But this time it went a bit other way
| >>
| >> root@gluster-client:~# dmesg
| >> Segmentation fault
| >>
| >>
| >> I was not able even to start the VM after I done the tests
| >>
| >> Could not read qcow2 header: Operation not permitted
| >>
| >> And it seems, it never starts to sync files after first
| >> disconnect. VM survives first disconnect, but not second (I
| >> waited around 30 minutes). Also, I've
| >> got network.ping-timeout: 2 in volume settings, but logs
| >> react on first disconnect around 30 seconds. Second was
| >> faster, 2 seconds.
| >>
| >> Reaction was different also:
| >>
| >> slower one:
| >> [2014-08-05 13:26:19.558435] W [socket.c:514:__socket_rwv]
| >> 0-glusterfs: readv failed (Connection timed out)
| >> [2014-08-05 13:26:19.558485] W
| >> [socket.c:1962:__socket_proto_state_machine] 0-glusterfs:
| >> reading from socket failed. Error (Connection timed out),
| >> [2014-08-05 13:26:21.281426] W [socket.c:514:__socket_rwv]| >> <http://10.250.0.1:49153>)
| >> 0-HA-fast-150G-PVE1-client-0: readv failed (Connection timed out)
| >> [2014-08-05 13:26:21.281474] W
| >> [socket.c:1962:__socket_proto_state_machine]
| >> 0-HA-fast-150G-PVE1-client-0: reading from socket failed.
| >> Error (Connection timed out), peer (10.250.0.1:49153
| >> [2014-08-05 13:26:21.281507] I| >> <http://10.250.0.2:49153> has not responded in the last 2
| >> [client.c:2098:client_rpc_notify]
| >> 0-HA-fast-150G-PVE1-client-0: disconnected
| >>
| >> the fast one:
| >> 2014-08-05 12:52:44.607389] C
| >> [client-handshake.c:127:rpc_client_ping_timer_expired]
| >> 0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
| >> <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:| >> seconds, disconnecting.
| >> [2014-08-05 12:52:44.607491] W [socket.c:514:__socket_rwv]
| >> 0-HA-fast-150G-PVE1-client-1: readv failed (No data available)
| >> [2014-08-05 12:52:44.607585] E
| >> [rpc-clnt.c:368:saved_frames_unwind]
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
| >> [0x7fcb1b4b0558]
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
| >> [0x7fcb1b4aea63]
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
| >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
| >> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
| >> 2014-08-05 12:52:42.463881 (xid=0x381883x)
| >> [2014-08-05 12:52:44.607604] W
| >> [client-rpc-fops.c:2624:client3_3_lookup_cbk]
| >> 0-HA-fast-150G-PVE1-client-1: remote operation failed:
| >> Transport endpoint is not connected. Path: /
| >> (00000000-0000-0000-0000-000000000001)
| >> [2014-08-05 12:52:44.607736] E
| >> [rpc-clnt.c:368:saved_frames_unwind]
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
| >> [0x7fcb1b4b0558]
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
| >> [0x7fcb1b4aea63]
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
| >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
| >> unwinding frame type(GlusterFS Handshake) op(PING(3)) called
| >> at 2014-08-05 12:52:42.463891 (xid=0x381884x)
| >> [2014-08-05 12:52:44.607753] W
| >> [client-handshake.c:276:client_ping_cbk]
| >> 0-HA-fast-150G-PVE1-client-1: timer must have expired
| >> [2014-08-05 12:52:44.607776] I
| >> [client.c:2098:client_rpc_notify]
| >> 0-HA-fast-150G-PVE1-client-1: disconnected
| >>
| >>
| >>
| >> I've got SSD disks (just for an info).
| >> Should I go and give a try for 3.5.2?
| >>
| >>
| >>
| >> 2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
| >>| >>> <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:
| >> reply along with gluster-users please :-). May be you are
| >> hitting 'reply' instead of 'reply all'?
| >>
| >> Pranith
| >>
| >> On 08/05/2014 03:35 PM, Roman wrote:
| >>> To make sure and clean, I've created another VM with raw
| >>> format and goint to repeat those steps. So now I've got
| >>> two VM-s one with qcow2 format and other with raw
| >>> format. I will send another e-mail shortly.
| >>>
| >>>
| >>> 2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
| >>>> <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>:| >>>
| >>>
| >>> On 08/05/2014 03:07 PM, Roman wrote:
| >>>> really, seems like the same file
| >>>>
| >>>> stor1:
| >>>> a951641c5230472929836f9fcede6b04
| >>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>
| >>>> stor2:
| >>>> a951641c5230472929836f9fcede6b04
| >>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>
| >>>>
| >>>> one thing I've seen from logs, that somehow proxmox
| >>>> VE is connecting with wrong version to servers?
| >>>> [2014-08-05 09:23:45.218550] I
| >>>> [client-handshake.c:1659:select_server_supported_programs]
| >>>> 0-HA-fast-150G-PVE1-client-0: Using Program
| >>>> GlusterFS 3.3, Num (1298437), Version (330)
| >>> It is the rpc (over the network data structures)
| >>> version, which is not changed at all from 3.3 so
| >>> thats not a problem. So what is the conclusion? Is
| >>> your test case working now or not?
| >>>
| >>> Pranith
| >>>
| >>>> but if I issue:
| >>>> root@pve1:~# glusterfs -V
| >>>> glusterfs 3.4.4 built on Jun 28 2014 03:44:57
| >>>> seems ok.
| >>>>
| >>>> server use 3.4.4 meanwhile
| >>>> [2014-08-05 09:23:45.117875] I
| >>>> [server-handshake.c:567:server_setvolume]
| >>>> 0-HA-fast-150G-PVE1-server: accepted client from
| >>>> stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
| >>>> (version: 3.4.4)
| >>>> [2014-08-05 09:23:49.103035] I
| >>>> [server-handshake.c:567:server_setvolume]
| >>>> 0-HA-fast-150G-PVE1-server: accepted client from
| >>>> stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
| >>>> (version: 3.4.4)
| >>>>
| >>>> if this could be the reason, of course.
| >>>> I did restart the Proxmox VE yesterday (just for an
| >>>> information)
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>
| >>>> 2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri
| >>>>> <mailto:pkarampu@xxxxxxxxxx>>:| >>>>
| >>>>
| >>>> On 08/05/2014 02:33 PM, Roman wrote:
| >>>>> Waited long enough for now, still different
| >>>>> sizes and no logs about healing :(
| >>>>>
| >>>>> stor1
| >>>>> # file:
| >>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| >>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| >>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
| >>>>>
| >>>>> root@stor1:~# du -sh
| >>>>> /exports/fast-test/150G/images/127/
| >>>>> 1.2G /exports/fast-test/150G/images/127/
| >>>>>
| >>>>>
| >>>>> stor2
| >>>>> # file:
| >>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
| >>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
| >>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
| >>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
| >>>>>
| >>>>>
| >>>>> root@stor2:~# du -sh
| >>>>> /exports/fast-test/150G/images/127/
| >>>>> 1.4G /exports/fast-test/150G/images/127/
| >>>> According to the changelogs, the file doesn't
| >>>> need any healing. Could you stop the operations
| >>>> on the VMs and take md5sum on both these machines?
| >>>>
| >>>> Pranith
| >>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>> 2014-08-05 11:49 GMT+03:00 Pranith Kumar
| >>>>> Karampuri <pkarampu@xxxxxxxxxx
| >>>>>> <http://10.250.0.1:49153>, attached to| >>>>>
| >>>>>
| >>>>> On 08/05/2014 02:06 PM, Roman wrote:
| >>>>>> Well, it seems like it doesn't see the
| >>>>>> changes were made to the volume ? I
| >>>>>> created two files 200 and 100 MB (from
| >>>>>> /dev/zero) after I disconnected the first
| >>>>>> brick. Then connected it back and got
| >>>>>> these logs:
| >>>>>>
| >>>>>> [2014-08-05 08:30:37.830150] I
| >>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
| >>>>>> 0-glusterfs: No change in volfile, continuing
| >>>>>> [2014-08-05 08:30:37.830207] I
| >>>>>> [rpc-clnt.c:1676:rpc_clnt_reconfig]
| >>>>>> 0-HA-fast-150G-PVE1-client-0: changing
| >>>>>> port to 49153 (from 0)
| >>>>>> [2014-08-05 08:30:37.830239] W
| >>>>>> [socket.c:514:__socket_rwv]
| >>>>>> 0-HA-fast-150G-PVE1-client-0: readv
| >>>>>> failed (No data available)
| >>>>>> [2014-08-05 08:30:37.831024] I
| >>>>>> [client-handshake.c:1659:select_server_supported_programs]
| >>>>>> 0-HA-fast-150G-PVE1-client-0: Using
| >>>>>> Program GlusterFS 3.3, Num (1298437),
| >>>>>> Version (330)
| >>>>>> [2014-08-05 08:30:37.831375] I
| >>>>>> [client-handshake.c:1456:client_setvolume_cbk]
| >>>>>> 0-HA-fast-150G-PVE1-client-0: Connected
| >>>>>> to 10.250.0.1:49153
| >>>>>> <mailto:pkarampu@xxxxxxxxxx>>:| >>>>>> remote volume '/exports/fast-test/150G'.
| >>>>>> [2014-08-05 08:30:37.831394] I
| >>>>>> [client-handshake.c:1468:client_setvolume_cbk]
| >>>>>> 0-HA-fast-150G-PVE1-client-0: Server and
| >>>>>> Client lk-version numbers are not same,
| >>>>>> reopening the fds
| >>>>>> [2014-08-05 08:30:37.831566] I
| >>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
| >>>>>> 0-HA-fast-150G-PVE1-client-0: Server lk
| >>>>>> version = 1
| >>>>>>
| >>>>>>
| >>>>>> [2014-08-05 08:30:37.830150] I
| >>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
| >>>>>> 0-glusterfs: No change in volfile, continuing
| >>>>>> this line seems weird to me tbh.
| >>>>>> I do not see any traffic on switch
| >>>>>> interfaces between gluster servers, which
| >>>>>> means, there is no syncing between them.
| >>>>>> I tried to ls -l the files on the client
| >>>>>> and servers to trigger the healing, but
| >>>>>> seems like no success. Should I wait more?
| >>>>> Yes, it should take around 10-15 minutes.
| >>>>> Could you provide 'getfattr -d -m. -e hex
| >>>>> <file-on-brick>' on both the bricks.
| >>>>>
| >>>>> Pranith
| >>>>>
| >>>>>>
| >>>>>>
| >>>>>> 2014-08-05 11:25 GMT+03:00 Pranith Kumar
| >>>>>> Karampuri <pkarampu@xxxxxxxxxx
| >>>>>>> <mailto:pkarampu@xxxxxxxxxx>>:| >>>>>>
| >>>>>>
| >>>>>> On 08/05/2014 01:10 PM, Roman wrote:
| >>>>>>> Ahha! For some reason I was not able
| >>>>>>> to start the VM anymore, Proxmox VE
| >>>>>>> told me, that it is not able to read
| >>>>>>> the qcow2 header due to permission
| >>>>>>> is denied for some reason. So I just
| >>>>>>> deleted that file and created a new
| >>>>>>> VM. And the nex message I've got was
| >>>>>>> this:
| >>>>>> Seems like these are the messages
| >>>>>> where you took down the bricks before
| >>>>>> self-heal. Could you restart the run
| >>>>>> waiting for self-heals to complete
| >>>>>> before taking down the next brick?
| >>>>>>
| >>>>>> Pranith
| >>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>> [2014-08-05 07:31:25.663412] E
| >>>>>>> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
| >>>>>>> 0-HA-fast-150G-PVE1-replicate-0:
| >>>>>>> Unable to self-heal contents of
| >>>>>>> '/images/124/vm-124-disk-1.qcow2'
| >>>>>>> (possible split-brain). Please
| >>>>>>> delete the file from all but the
| >>>>>>> preferred subvolume.- Pending
| >>>>>>> matrix: [ [ 0 60 ] [ 11 0 ] ]
| >>>>>>> [2014-08-05 07:31:25.663955] E
| >>>>>>> [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
| >>>>>>> 0-HA-fast-150G-PVE1-replicate-0:
| >>>>>>> background data self-heal failed on
| >>>>>>> /images/124/vm-124-disk-1.qcow2
| >>>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>> 2014-08-05 10:13 GMT+03:00 Pranith
| >>>>>>> Kumar Karampuri <pkarampu@xxxxxxxxxx
| >>>>>>>> <mailto:pkarampu@xxxxxxxxxx>>:| >>>>>>>
| >>>>>>> I just responded to your earlier
| >>>>>>> mail about how the log looks.
| >>>>>>> The log comes on the mount's logfile
| >>>>>>>
| >>>>>>> Pranith
| >>>>>>>
| >>>>>>> On 08/05/2014 12:41 PM, Roman wrote:
| >>>>>>>> Ok, so I've waited enough, I
| >>>>>>>> think. Had no any traffic on
| >>>>>>>> switch ports between servers.
| >>>>>>>> Could not find any suitable log
| >>>>>>>> message about completed
| >>>>>>>> self-heal (waited about 30
| >>>>>>>> minutes). Plugged out the other
| >>>>>>>> server's UTP cable this time
| >>>>>>>> and got in the same situation:
| >>>>>>>> root@gluster-test1:~# cat
| >>>>>>>> /var/log/dmesg
| >>>>>>>> -bash: /bin/cat: Input/output error
| >>>>>>>>
| >>>>>>>> brick logs:
| >>>>>>>> [2014-08-05 07:09:03.005474] I
| >>>>>>>> [server.c:762:server_rpc_notify]
| >>>>>>>> 0-HA-fast-150G-PVE1-server:
| >>>>>>>> disconnecting connectionfrom
| >>>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
| >>>>>>>> [2014-08-05 07:09:03.005530] I
| >>>>>>>> [server-helpers.c:729:server_connection_put]
| >>>>>>>> 0-HA-fast-150G-PVE1-server:
| >>>>>>>> Shutting down connection
| >>>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
| >>>>>>>> [2014-08-05 07:09:03.005560] I
| >>>>>>>> [server-helpers.c:463:do_fd_cleanup]
| >>>>>>>> 0-HA-fast-150G-PVE1-server: fd
| >>>>>>>> cleanup on
| >>>>>>>> /images/124/vm-124-disk-1.qcow2
| >>>>>>>> [2014-08-05 07:09:03.005797] I
| >>>>>>>> [server-helpers.c:617:server_connection_destroy]
| >>>>>>>> 0-HA-fast-150G-PVE1-server:
| >>>>>>>> destroyed connection of
| >>>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>> 2014-08-05 9:53 GMT+03:00
| >>>>>>>> Pranith Kumar Karampuri
| >>>>>>>> <pkarampu@xxxxxxxxxx
| >>>>>>>>> <mailto:pkarampu@xxxxxxxxxx>>:| >>>>>>>>
| >>>>>>>> Do you think it is possible
| >>>>>>>> for you to do these tests
| >>>>>>>> on the latest version
| >>>>>>>> 3.5.2? 'gluster volume heal
| >>>>>>>> <volname> info' would give
| >>>>>>>> you that information in
| >>>>>>>> versions > 3.5.1.
| >>>>>>>> Otherwise you will have to
| >>>>>>>> check it from either the
| >>>>>>>> logs, there will be
| >>>>>>>> self-heal completed message
| >>>>>>>> on the mount logs (or) by
| >>>>>>>> observing 'getfattr -d -m.
| >>>>>>>> -e hex <image-file-on-bricks>'
| >>>>>>>>
| >>>>>>>> Pranith
| >>>>>>>>
| >>>>>>>>
| >>>>>>>> On 08/05/2014 12:09 PM,
| >>>>>>>> Roman wrote:
| >>>>>>>>> Ok, I understand. I will
| >>>>>>>>> try this shortly.
| >>>>>>>>> How can I be sure, that
| >>>>>>>>> healing process is done,
| >>>>>>>>> if I am not able to see
| >>>>>>>>> its status?
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>> 2014-08-05 9:30 GMT+03:00
| >>>>>>>>> Pranith Kumar Karampuri
| >>>>>>>>> <pkarampu@xxxxxxxxxx
| >>>>>>>>>> <mailto:pkarampu@xxxxxxxxxx>>:| >>>>>>>>>
| >>>>>>>>> Mounts will do the
| >>>>>>>>> healing, not the
| >>>>>>>>> self-heal-daemon. The
| >>>>>>>>> problem I feel is that
| >>>>>>>>> whichever process does
| >>>>>>>>> the healing has the
| >>>>>>>>> latest information
| >>>>>>>>> about the good bricks
| >>>>>>>>> in this usecase. Since
| >>>>>>>>> for VM usecase, mounts
| >>>>>>>>> should have the latest
| >>>>>>>>> information, we should
| >>>>>>>>> let the mounts do the
| >>>>>>>>> healing. If the mount
| >>>>>>>>> accesses the VM image
| >>>>>>>>> either by someone
| >>>>>>>>> doing operations
| >>>>>>>>> inside the VM or
| >>>>>>>>> explicit stat on the
| >>>>>>>>> file it should do the
| >>>>>>>>> healing.
| >>>>>>>>>
| >>>>>>>>> Pranith.
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>> On 08/05/2014 10:39
| >>>>>>>>> AM, Roman wrote:
| >>>>>>>>>> Hmmm, you told me to
| >>>>>>>>>> turn it off. Did I
| >>>>>>>>>> understood something
| >>>>>>>>>> wrong? After I issued
| >>>>>>>>>> the command you've
| >>>>>>>>>> sent me, I was not
| >>>>>>>>>> able to watch the
| >>>>>>>>>> healing process, it
| >>>>>>>>>> said, it won't be
| >>>>>>>>>> healed, becouse its
| >>>>>>>>>> turned off.
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>> 2014-08-05 5:39
| >>>>>>>>>> GMT+03:00 Pranith
| >>>>>>>>>> Kumar Karampuri
| >>>>>>>>>> <pkarampu@xxxxxxxxxx
| >>>>>>>>>>> <mailto:pkarampu@xxxxxxxxxx>>:| >>>>>>>>>>
| >>>>>>>>>> You didn't
| >>>>>>>>>> mention anything
| >>>>>>>>>> about
| >>>>>>>>>> self-healing. Did
| >>>>>>>>>> you wait until
| >>>>>>>>>> the self-heal is
| >>>>>>>>>> complete?
| >>>>>>>>>>
| >>>>>>>>>> Pranith
| >>>>>>>>>>
| >>>>>>>>>> On 08/04/2014
| >>>>>>>>>> 05:49 PM, Roman
| >>>>>>>>>> wrote:
| >>>>>>>>>>> Hi!
| >>>>>>>>>>> Result is pretty
| >>>>>>>>>>> same. I set the
| >>>>>>>>>>> switch port down
| >>>>>>>>>>> for 1st server,
| >>>>>>>>>>> it was ok. Then
| >>>>>>>>>>> set it up back
| >>>>>>>>>>> and set other
| >>>>>>>>>>> server's port
| >>>>>>>>>>> off. and it
| >>>>>>>>>>> triggered IO
| >>>>>>>>>>> error on two
| >>>>>>>>>>> virtual
| >>>>>>>>>>> machines: one
| >>>>>>>>>>> with local root
| >>>>>>>>>>> FS but network
| >>>>>>>>>>> mounted storage.
| >>>>>>>>>>> and other with
| >>>>>>>>>>> network root FS.
| >>>>>>>>>>> 1st gave an
| >>>>>>>>>>> error on copying
| >>>>>>>>>>> to or from the
| >>>>>>>>>>> mounted network
| >>>>>>>>>>> disk, other just
| >>>>>>>>>>> gave me an error
| >>>>>>>>>>> for even reading
| >>>>>>>>>>> log.files.
| >>>>>>>>>>>
| >>>>>>>>>>> cat:
| >>>>>>>>>>> /var/log/alternatives.log:
| >>>>>>>>>>> Input/output error
| >>>>>>>>>>> then I reset the
| >>>>>>>>>>> kvm VM and it
| >>>>>>>>>>> said me, there
| >>>>>>>>>>> is no boot
| >>>>>>>>>>> device. Next I
| >>>>>>>>>>> virtually
| >>>>>>>>>>> powered it off
| >>>>>>>>>>> and then back on
| >>>>>>>>>>> and it has booted.
| >>>>>>>>>>>
| >>>>>>>>>>> By the way, did
| >>>>>>>>>>> I have to
| >>>>>>>>>>> start/stop volume?
| >>>>>>>>>>>
| >>>>>>>>>>> >> Could you do
| >>>>>>>>>>> the following
| >>>>>>>>>>> and test it again?
| >>>>>>>>>>> >> gluster volume
| >>>>>>>>>>> set <volname>
| >>>>>>>>>>> cluster.self-heal-daemon
| >>>>>>>>>>> off
| >>>>>>>>>>>
| >>>>>>>>>>> >>Pranith
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>> 2014-08-04 14:10
| >>>>>>>>>>> GMT+03:00
| >>>>>>>>>>> Pranith Kumar
| >>>>>>>>>>> Karampuri
| >>>>>>>>>>> <pkarampu@xxxxxxxxxx
| >>>>>>>>>>>> <mailto:Gluster-users@xxxxxxxxxxx>| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>> On
| >>>>>>>>>>> 08/04/2014
| >>>>>>>>>>> 03:33 PM,
| >>>>>>>>>>> Roman wrote:
| >>>>>>>>>>>> Hello!
| >>>>>>>>>>>>
| >>>>>>>>>>>> Facing the
| >>>>>>>>>>>> same
| >>>>>>>>>>>> problem as
| >>>>>>>>>>>> mentioned
| >>>>>>>>>>>> here:
| >>>>>>>>>>>>
| >>>>>>>>>>>> http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
| >>>>>>>>>>>>
| >>>>>>>>>>>> my set up
| >>>>>>>>>>>> is up and
| >>>>>>>>>>>> running, so
| >>>>>>>>>>>> i'm ready
| >>>>>>>>>>>> to help you
| >>>>>>>>>>>> back with
| >>>>>>>>>>>> feedback.
| >>>>>>>>>>>>
| >>>>>>>>>>>> setup:
| >>>>>>>>>>>> proxmox
| >>>>>>>>>>>> server as
| >>>>>>>>>>>> client
| >>>>>>>>>>>> 2 gluster
| >>>>>>>>>>>> physical
| >>>>>>>>>>>> servers
| >>>>>>>>>>>>
| >>>>>>>>>>>> server side
| >>>>>>>>>>>> and client
| >>>>>>>>>>>> side both
| >>>>>>>>>>>> running atm
| >>>>>>>>>>>> 3.4.4
| >>>>>>>>>>>> glusterfs
| >>>>>>>>>>>> from
| >>>>>>>>>>>> gluster repo.
| >>>>>>>>>>>>
| >>>>>>>>>>>> the problem is:
| >>>>>>>>>>>>
| >>>>>>>>>>>> 1. craeted
| >>>>>>>>>>>> replica bricks.
| >>>>>>>>>>>> 2. mounted
| >>>>>>>>>>>> in proxmox
| >>>>>>>>>>>> (tried both
| >>>>>>>>>>>> promox
| >>>>>>>>>>>> ways: via
| >>>>>>>>>>>> GUI and
| >>>>>>>>>>>> fstab (with
| >>>>>>>>>>>> backup
| >>>>>>>>>>>> volume
| >>>>>>>>>>>> line), btw
| >>>>>>>>>>>> while
| >>>>>>>>>>>> mounting
| >>>>>>>>>>>> via fstab
| >>>>>>>>>>>> I'm unable
| >>>>>>>>>>>> to launch a
| >>>>>>>>>>>> VM without
| >>>>>>>>>>>> cache,
| >>>>>>>>>>>> meanwhile
| >>>>>>>>>>>> direct-io-mode
| >>>>>>>>>>>> is enabled
| >>>>>>>>>>>> in fstab line)
| >>>>>>>>>>>> 3. installed VM
| >>>>>>>>>>>> 4. bring
| >>>>>>>>>>>> one volume
| >>>>>>>>>>>> down - ok
| >>>>>>>>>>>> 5. bringing
| >>>>>>>>>>>> up, waiting
| >>>>>>>>>>>> for sync is
| >>>>>>>>>>>> done.
| >>>>>>>>>>>> 6. bring
| >>>>>>>>>>>> other
| >>>>>>>>>>>> volume down
| >>>>>>>>>>>> - getting
| >>>>>>>>>>>> IO errors
| >>>>>>>>>>>> on VM guest
| >>>>>>>>>>>> and not
| >>>>>>>>>>>> able to
| >>>>>>>>>>>> restore the
| >>>>>>>>>>>> VM after I
| >>>>>>>>>>>> reset the
| >>>>>>>>>>>> VM via
| >>>>>>>>>>>> host. It
| >>>>>>>>>>>> says (no
| >>>>>>>>>>>> bootable
| >>>>>>>>>>>> media).
| >>>>>>>>>>>> After I
| >>>>>>>>>>>> shut it
| >>>>>>>>>>>> down
| >>>>>>>>>>>> (forced)
| >>>>>>>>>>>> and bring
| >>>>>>>>>>>> back up, it
| >>>>>>>>>>>> boots.
| >>>>>>>>>>> Could you do
| >>>>>>>>>>> the
| >>>>>>>>>>> following
| >>>>>>>>>>> and test it
| >>>>>>>>>>> again?
| >>>>>>>>>>> gluster
| >>>>>>>>>>> volume set
| >>>>>>>>>>> <volname>
| >>>>>>>>>>> cluster.self-heal-daemon
| >>>>>>>>>>> off
| >>>>>>>>>>>
| >>>>>>>>>>> Pranith
| >>>>>>>>>>>>
| >>>>>>>>>>>> Need help.
| >>>>>>>>>>>> Tried
| >>>>>>>>>>>> 3.4.3, 3.4.4.
| >>>>>>>>>>>> Still
| >>>>>>>>>>>> missing
| >>>>>>>>>>>> pkg-s for
| >>>>>>>>>>>> 3.4.5 for
| >>>>>>>>>>>> debian and
| >>>>>>>>>>>> 3.5.2
| >>>>>>>>>>>> (3.5.1
| >>>>>>>>>>>> always
| >>>>>>>>>>>> gives a
| >>>>>>>>>>>> healing
| >>>>>>>>>>>> error for
| >>>>>>>>>>>> some reason)
| >>>>>>>>>>>>
| >>>>>>>>>>>> --
| >>>>>>>>>>>> Best regards,
| >>>>>>>>>>>> Roman.
| >>>>>>>>>>>>
| >>>>>>>>>>>>
| >>>>>>>>>>>> _______________________________________________
| >>>>>>>>>>>> Gluster-users
| >>>>>>>>>>>> mailing list
| >>>>>>>>>>>> Gluster-users@xxxxxxxxxxx
| >>>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>>
| >>>>>>>>>>> --
| >>>>>>>>>>> Best regards,
| >>>>>>>>>>> Roman.
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>>
| >>>>>>>>>> --
| >>>>>>>>>> Best regards,
| >>>>>>>>>> Roman.
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>>
| >>>>>>>>> --
| >>>>>>>>> Best regards,
| >>>>>>>>> Roman.
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>>
| >>>>>>>> --
| >>>>>>>> Best regards,
| >>>>>>>> Roman.
| >>>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>>
| >>>>>>> --
| >>>>>>> Best regards,
| >>>>>>> Roman.
| >>>>>>
| >>>>>>
| >>>>>>
| >>>>>>
| >>>>>> --
| >>>>>> Best regards,
| >>>>>> Roman.
| >>>>>
| >>>>>
| >>>>>
| >>>>>
| >>>>> --
| >>>>> Best regards,
| >>>>> Roman.
| >>>>
| >>>>
| >>>>
| >>>>
| >>>> --
| >>>> Best regards,
| >>>> Roman.
| >>>
| >>>
| >>>
| >>>
| >>> --
| >>> Best regards,
| >>> Roman.
| >>
| >>
| >>
| >>
| >> --
| >> Best regards,
| >> Roman.
| >>
| >>
| >>
| >>
| >> --
| >> Best regards,
| >> Roman.
| >
| >
| >
| >
| > --
| > Best regards,
| > Roman.
|
|
--
Best regards,
Roman.
--
Best regards,
Roman.
Best regards,
Roman.
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users