Re: Freezing during heal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Could you share the client logs and information about the approx time/day when you saw this issue?

-Krutika

On Sat, Apr 16, 2016 at 12:57 AM, Kevin Lemonnier <lemonnierk@xxxxxxxxx> wrote:
Hi,

We have a small glusterFS 3.7.6 cluster with 3 nodes running with proxmox VM's on it. I did set up the different recommended option like the virt group, but
by hand since it's on debian. The shards are 256MB, if that matters.

This morning the second node crashed, and as it came back up started a heal, but that basically froze all the VM's running on that volume. Since we really really
can't have 40 minutes down time in the middle of the day, I just removed the node from the network and that stopped the heal, allowing the VM's to access
their disks again. The plan was to re-connecte the node in a couple of hours to let it heal at night.
But a VM crashed now, and it can't boot up again : seems to freez trying to access the disks.

Looking at the heal info for the volume, it has gone way up since this morning, it looks like the VM's aren't writing to both nodes, just the one they are on.
It seems pretty bad, we have 2 nodes on 3 up, I would expect the volume to work just fine since it has quorum. What am I missing ?

It is still too early to start the heal, is there a way to start the VM anyway right now ? I mean, it was running a moment ago so the data is there, it just needs
to let the VM access it.



Volume Name: vm-storage
Type: Replicate
Volume ID: a5b19324-f032-4136-aaac-5e9a4c88aaef
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: first_node:/mnt/vg1-storage
Brick2: second_node:/mnt/vg1-storage
Brick3: third_node:/mnt/vg1-storage
Options Reconfigured:
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 256MB
cluster.server-quorum-ratio: 51%


Thanks for your help

--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux