healing never ends (or never starts?) on replicated volume with virtual block device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

another stupid/interesting situation:

root@stor1:~# gluster volume heal HA-WIN-TT-1T info
Brick stor1:/exports/NFS-WIN/1T/
/disk - Possibly undergoing heal
Number of entries: 1

Brick stor2:/exports/NFS-WIN/1T/
/test
/disk - Possibly undergoing heal
Number of entries: 2

due to testings I've brought down stor1 port on the switch and the made it up again.
then one of the volumes successfully  restored and healed (with virtual machines)
while other still (about 2 hours atm) says, there is a healing process, meanwhile there is no traffic between the servers and client/server.

the /test is simple new file, i've made while stor1 was down.
the /disk is a simple virtual block-device made of /dev/null which is 900GB and is mounted on windows server via iscsitarget :). and it seem it wont stop healing forever, as it can't decide which file is right?

gluster client machine, where is volume for iscsi target is monted logs:
[2014-11-06 08:19:36.949092] W [client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0: remote operation failed: Transport endpoint is not connected
[2014-11-06 08:19:36.949148] W [client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0: remote operation failed: Transport endpoint is not connected
[2014-11-06 08:19:36.951202] W [client-rpc-fops.c:1580:client3_3_finodelk_cbk] 0-HA-WIN-TT-1T-client-0: remote operation failed: Transport endpoint is not connected
[2014-11-06 08:19:57.682937] W [socket.c:522:__socket_rwv] 0-glusterfs: readv on 10.250.0.1:24007 failed (Connection timed out)
[2014-11-06 08:20:17.950981] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to 10.250.0.1:24007 failed (No route to host)
[2014-11-06 08:20:40.062928] E [socket.c:2161:socket_connect_finish] 0-HA-WIN-TT-1T-client-0: connection to 10.250.0.1:24007 failed (Connection timed out)
[2014-11-06 08:30:15.638197] W [dht-diskusage.c:232:dht_is_subvol_filled] 0-HA-WIN-TT-1T-dht: disk space on subvolume 'HA-WIN-TT-1T-replicate-0' is getting full (95.00 %), consider adding more nodes
[2014-11-06 08:36:18.385659] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2014-11-06 08:36:18.386573] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0)
[2014-11-06 08:36:18.387182] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-11-06 08:36:18.387414] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Connected to 10.250.0.1:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-11-06 08:36:18.387433] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-11-06 08:36:18.387446] I [client-handshake.c:1314:client_post_handshake] 0-HA-WIN-TT-1T-client-0: 1 fds open - Delaying child_up until they are re-opened
[2014-11-06 08:36:18.387730] I [client-handshake.c:936:client_child_up_reopen_done] 0-HA-WIN-TT-1T-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP
[2014-11-06 08:36:18.387862] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0: Server lk version = 1

brick log on stor1:

[2014-11-06 08:38:04.269503] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-11-06 08:38:04.269908] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Connected to 10.250.0.2:49160, attached to remote volume '/exports/NFS-WIN/1T'.
[2014-11-06 08:38:04.269962] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-11-06 08:38:04.270560] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1: Server lk version = 1
[2014-11-06 08:39:33.277219] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 08:49:33.327786] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 08:59:33.375835] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:09:33.430726] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:19:33.486488] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:29:33.541596] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:39:33.595242] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:49:33.648526] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 09:59:33.702368] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:09:33.756633] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:19:33.810984] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:29:33.865172] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:39:33.918765] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:49:33.973283] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0
[2014-11-06 10:59:34.028836] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0

same on stor2

--
Best regards,
Roman.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux