Re: libgfapi failover problem on replica bricks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, the "gluster-users" fell out of the receivers list somehow, so I'm replying to it with the full history.
I'm watching the mount's logfile with tail -f command and am not able to see such logs... seems like for ever? What is the  optimal time for self-heal to complete? The mount is almost empty, there is a stripped file with VM image only.

The only logs I see are:

2014-08-05 07:12:03.808352] I [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server: accepted client from stor2-31563-2014/08/05-06:10:19:381800-HA-fast-150G-PVE1-client-0-0 (version: 3.4.4)
[2014-08-05 07:12:04.547935] I [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server: accepted client from sisemon-262292-2014/08/04-13:27:19:221777-HA-fast-150G-PVE1-client-0-0 (version: 3.4.4)
[2014-08-05 07:12:06.761596] I [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server: accepted client from pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0 (version: 3.4.4)
[2014-08-05 07:12:09.151322] I [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server: accepted client from pve1-27476-2014/08/04-13:27:19:838805-HA-fast-150G-PVE1-client-0-0 (version: 3.4.4)



2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:
I just responded to your earlier mail about how the log looks. The log comes on the mount's logfile

Pranith

On 08/05/2014 12:41 PM, Roman wrote:
Ok, so I've waited enough, I think. Had no any traffic on switch ports between servers. Could not find any suitable log message about completed self-heal (waited about 30 minutes). Plugged out the other server's UTP cable this time and got in the same situation:
root@gluster-test1:~# cat /var/log/dmesg
-bash: /bin/cat: Input/output error

brick logs:
[2014-08-05 07:09:03.005474] I [server.c:762:server_rpc_notify] 0-HA-fast-150G-PVE1-server: disconnecting connectionfrom pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
[2014-08-05 07:09:03.005530] I [server-helpers.c:729:server_connection_put] 0-HA-fast-150G-PVE1-server: Shutting down connection pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
[2014-08-05 07:09:03.005560] I [server-helpers.c:463:do_fd_cleanup] 0-HA-fast-150G-PVE1-server: fd cleanup on /images/124/vm-124-disk-1.qcow2
[2014-08-05 07:09:03.005797] I [server-helpers.c:617:server_connection_destroy] 0-HA-fast-150G-PVE1-server: destroyed connection of pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0





2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:
Do you think it is possible for you to do these tests on the latest version 3.5.2? 'gluster volume heal <volname> info' would give you that information in versions > 3.5.1.
Otherwise you will have to check it from either the logs, there will be self-heal completed message on the mount logs (or) by observing 'getfattr -d -m. -e hex <image-file-on-bricks>'

Pranith


On 08/05/2014 12:09 PM, Roman wrote:
Ok, I understand. I will try this shortly.
How can I be sure, that healing process is done, if I am not able to see its status?


2014-08-05 9:30 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:
Mounts will do the healing, not the self-heal-daemon. The problem I feel is that whichever process does the healing has the latest information about the good bricks in this usecase. Since for VM usecase, mounts should have the latest information, we should let the mounts do the healing. If the mount accesses the VM image either by someone doing operations inside the VM or explicit stat on the file it should do the healing.

Pranith.


On 08/05/2014 10:39 AM, Roman wrote:
Hmmm, you told me to turn it off. Did I understood something wrong? After I issued the command you've sent me, I was not able to watch the healing process, it said, it won't be healed, becouse its turned off.


2014-08-05 5:39 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:
You didn't mention anything about self-healing. Did you wait until the self-heal is complete?

Pranith

On 08/04/2014 05:49 PM, Roman wrote:
Hi!
Result is pretty same. I set the switch port down for 1st server, it was ok. Then set it up back and set other server's port off. and it triggered IO error on two virtual machines: one with local root FS but network mounted storage. and other with network root FS. 1st gave an error on copying to or from the mounted network disk, other just gave me an error for even reading log.files.

cat: /var/log/alternatives.log: Input/output error
then I reset the kvm VM and it said me, there is no boot device. Next I virtually powered it off and then back on and it has booted.

By the way, did I have to start/stop volume?

>> Could you do the following and test it again?
>> gluster volume set <volname> cluster.self-heal-daemon off

>>Pranith




2014-08-04 14:10 GMT+03:00 Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>:

On 08/04/2014 03:33 PM, Roman wrote:
Hello!

Facing the same problem as mentioned here:


my set up is up and running, so i'm ready to help you back with feedback.

setup:
proxmox server as client
2 gluster physical  servers

server side and client side both running atm 3.4.4 glusterfs from gluster repo.

the problem is:

1. craeted replica bricks.
2. mounted in proxmox (tried both promox ways: via GUI and fstab (with backup volume line), btw while mounting via fstab I'm unable to launch a VM without cache, meanwhile direct-io-mode is enabled in fstab line)
3. installed VM
4. bring one volume down - ok
5. bringing up, waiting for sync is done.
6. bring other volume down - getting IO errors on VM guest and not able to restore the VM after I reset the VM via host. It says (no bootable media). After I shut it down (forced) and bring back up, it boots.
Could you do the following and test it again?
gluster volume set <volname> cluster.self-heal-daemon off

Pranith

Need help. Tried 3.4.3, 3.4.4.
Still missing pkg-s for 3.4.5 for debian and 3.5.2 (3.5.1 always gives a healing error for some reason)

--
Best regards,
Roman.


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users




--
Best regards,
Roman.




--
Best regards,
Roman.




--
Best regards,
Roman.




--
Best regards,
Roman.




--
Best regards,
Roman.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux