Re: Freezing during heal

Krutika Dhananjay <kdhananj@xxxxxxxxxx> · Mon, 18 Apr 2016 19:58:44 +0530

Hi,

Yeah, so the fuse mount log didn't convey much information.

So one of the reasons heal may have taken so long (and also consumed resources) is because of a bug in self-heal where it would do heal from both source bricks in 3-way replication. With such a bug, heal would take twice the amount of time and consume resources both the times by the same amount.

This issue is fixed at http://review.gluster.org/#/c/14008/ and will be available in 3.7.12.

The other thing you could do is to set cluster.data-self-heal-algorithm to 'full', for better heal performance and more regulated resource consumption by the same.
 #gluster volume set <VOL> cluster.data-self-heal-algorithm full

As far as sharding is concerned, some critical caching issues were fixed in 3.7.7 and 3.7.8.
And my guess is that the vm crash/unbootable state could be because of this issue, which exists in 3.7.6.

3.7.10 saw the introduction of throttled client side heals which also moves such heals to the background, which is all the more helpful for preventing starvation of vms during client heal.

Considering these factors, I think it would be better if you upgraded your machines to 3.7.10.

Do let me know if migrating to 3.7.10 solves your issues.

-Krutika

On Mon, Apr 18, 2016 at 12:40 PM, Kevin Lemonnier <lemonnierk@xxxxxxxxx> wrote:
Yes, but as I was saying I don't believe KVM is using a mount point, I think it uses

the API (http://www.gluster.org/community/documentation/index.php/Libgfapi_with_qemu_libvirt).

Might be mistaken ofcourse. Proxmox does have a mountpoint for conveniance, I'll attach those

logs, hoping they contain the informations you need. They do seem to contain a lot of errors

for the 15.

For reference, there was a disconnect of the first brick (10.10.0.1) in the morning and then a successfull

heal that caused about 40 minutes downtime of the VMs. Right after that heal finished (if my memory is

correct it was about noon or close) the second brick (10.10.0.2) rebooted, and that's the one I disconnected

to prevent the heal from causing another downtime.

I reconnected it one at the end of the afternoon, hoping the heal would go well but everything went down

like in the morning so I disconnected it again, and waited 11pm (23:00) to reconnect it and let it finish.

Thanks for your help,

On Mon, Apr 18, 2016 at 12:28:28PM +0530, Krutika Dhananjay wrote:

> Sorry, I was referring to the glusterfs client logs.

>

> Assuming you are using FUSE mount, your log file will be in

> /var/log/glusterfs/<hyphenated-mount-point-path>.log

>

> -Krutika

>

> On Sun, Apr 17, 2016 at 9:37 PM, Kevin Lemonnier <lemonnierk@xxxxxxxxx>

> wrote:

>

> > I believe Proxmox is just an interface to KVM that uses the lib, so if I'm

> > not mistaken there isn't client logs ?

> >

> > It's not the first time I have the issue, it happens on every heal on the

> > 2 clusters I have.

> >

> > I did let the heal finish that night and the VMs are working now, but it

> > is pretty scarry for future crashes or brick replacement.

> > Should I maybe lower the shard size ? Won't solve the fact that 2 bricks

> > on 3 aren't keeping the filesystem usable but might make the healing

> > quicker right ?

> >

> > Thanks

> >

> > Le 17 avril 2016 17:56:37 GMT+02:00, Krutika Dhananjay <

> > kdhananj@xxxxxxxxxx> a écrit :

> > >Could you share the client logs and information about the approx

> > >time/day

> > >when you saw this issue?

> > >

> > >-Krutika

> > >

> > >On Sat, Apr 16, 2016 at 12:57 AM, Kevin Lemonnier

> > ><lemonnierk@xxxxxxxxx>

> > >wrote:

> > >

> > >> Hi,

> > >>

> > >> We have a small glusterFS 3.7.6 cluster with 3 nodes running with

> > >proxmox

> > >> VM's on it. I did set up the different recommended option like the

> > >virt

> > >> group, but

> > >> by hand since it's on debian. The shards are 256MB, if that matters.

> > >>

> > >> This morning the second node crashed, and as it came back up started

> > >a

> > >> heal, but that basically froze all the VM's running on that volume.

> > >Since

> > >> we really really

> > >> can't have 40 minutes down time in the middle of the day, I just

> > >removed

> > >> the node from the network and that stopped the heal, allowing the

> > >VM's to

> > >> access

> > >> their disks again. The plan was to re-connecte the node in a couple

> > >of

> > >> hours to let it heal at night.

> > >> But a VM crashed now, and it can't boot up again : seems to freez

> > >trying

> > >> to access the disks.

> > >>

> > >> Looking at the heal info for the volume, it has gone way up since

> > >this

> > >> morning, it looks like the VM's aren't writing to both nodes, just

> > >the one

> > >> they are on.

> > >> It seems pretty bad, we have 2 nodes on 3 up, I would expect the

> > >volume to

> > >> work just fine since it has quorum. What am I missing ?

> > >>

> > >> It is still too early to start the heal, is there a way to start the

> > >VM

> > >> anyway right now ? I mean, it was running a moment ago so the data is

> > >> there, it just needs

> > >> to let the VM access it.

> > >>

> > >>

> > >>

> > >> Volume Name: vm-storage

> > >> Type: Replicate

> > >> Volume ID: a5b19324-f032-4136-aaac-5e9a4c88aaef

> > >> Status: Started

> > >> Number of Bricks: 1 x 3 = 3

> > >> Transport-type: tcp

> > >> Bricks:

> > >> Brick1: first_node:/mnt/vg1-storage

> > >> Brick2: second_node:/mnt/vg1-storage

> > >> Brick3: third_node:/mnt/vg1-storage

> > >> Options Reconfigured:

> > >> cluster.quorum-type: auto

> > >> cluster.server-quorum-type: server

> > >> network.remote-dio: enable

> > >> cluster.eager-lock: enable

> > >> performance.readdir-ahead: on

> > >> performance.quick-read: off

> > >> performance.read-ahead: off

> > >> performance.io-cache: off

> > >> performance.stat-prefetch: off

> > >> features.shard: on

> > >> features.shard-block-size: 256MB

> > >> cluster.server-quorum-ratio: 51%

> > >>

> > >>

> > >> Thanks for your help

> > >>

> > >> --

> > >> Kevin Lemonnier

> > >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111

> > >>

> > >> _______________________________________________

> > >> Gluster-users mailing list

> > >> Gluster-users@xxxxxxxxxxx

> > >> http://www.gluster.org/mailman/listinfo/gluster-users

> > >>

> >

> > --

> > Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.

> > _______________________________________________

> > Gluster-users mailing list

> > Gluster-users@xxxxxxxxxxx

> > http://www.gluster.org/mailman/listinfo/gluster-users

> >

--

Kevin Lemonnier

PGP Fingerprint : 89A5 2283 04A0 E6E9 0111

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users