Re: Freezing during heal

Krutika Dhananjay <kdhananj@xxxxxxxxxx> · Sun, 17 Apr 2016 21:26:37 +0530

Could you share the client logs and information about the approx time/day when you saw this issue?

-Krutika

On Sat, Apr 16, 2016 at 12:57 AM, Kevin Lemonnier <lemonnierk@xxxxxxxxx> wrote:
Hi,

We have a small glusterFS 3.7.6 cluster with 3 nodes running with proxmox VM's on it. I did set up the different recommended option like the virt group, but

by hand since it's on debian. The shards are 256MB, if that matters.

This morning the second node crashed, and as it came back up started a heal, but that basically froze all the VM's running on that volume. Since we really really

can't have 40 minutes down time in the middle of the day, I just removed the node from the network and that stopped the heal, allowing the VM's to access

their disks again. The plan was to re-connecte the node in a couple of hours to let it heal at night.

But a VM crashed now, and it can't boot up again : seems to freez trying to access the disks.

Looking at the heal info for the volume, it has gone way up since this morning, it looks like the VM's aren't writing to both nodes, just the one they are on.

It seems pretty bad, we have 2 nodes on 3 up, I would expect the volume to work just fine since it has quorum. What am I missing ?

It is still too early to start the heal, is there a way to start the VM anyway right now ? I mean, it was running a moment ago so the data is there, it just needs

to let the VM access it.

Volume Name: vm-storage

Type: Replicate

Volume ID: a5b19324-f032-4136-aaac-5e9a4c88aaef

Status: Started

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: first_node:/mnt/vg1-storage

Brick2: second_node:/mnt/vg1-storage

Brick3: third_node:/mnt/vg1-storage

Options Reconfigured:

cluster.quorum-type: auto

cluster.server-quorum-type: server

network.remote-dio: enable

cluster.eager-lock: enable

performance.readdir-ahead: on

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

features.shard: on

features.shard-block-size: 256MB

cluster.server-quorum-ratio: 51%

Thanks for your help

--

Kevin Lemonnier

PGP Fingerprint : 89A5 2283 04A0 E6E9 0111

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users