Re: Freezing during heal

Kevin Lemonnier <lemonnierk@xxxxxxxxx> · Mon, 18 Apr 2016 16:32:10 +0200

I will try migrating to 3.7.10, is it considered stable yet ?

Should I change the self heal algorithm even if I move to 3.7.10, or is that not necessary ?
Not sure what that change might do.

Anyway, I'll try to create a 3.7.10 cluster in the week end slowly move the VMs on it then,
Thanks a lot for your help,

Regards

On Mon, Apr 18, 2016 at 07:58:44PM +0530, Krutika Dhananjay wrote:
> Hi,
> 
> Yeah, so the fuse mount log didn't convey much information.
> 
> So one of the reasons heal may have taken so long (and also consumed
> resources) is because of a bug in self-heal where it would do heal from
> both source bricks in 3-way replication. With such a bug, heal would take
> twice the amount of time and consume resources both the times by the same
> amount.
> 
> This issue is fixed at http://review.gluster.org/#/c/14008/ and will be
> available in 3.7.12.
> 
> The other thing you could do is to set cluster.data-self-heal-algorithm to
> 'full', for better heal performance and more regulated resource consumption
> by the same.
>  #gluster volume set <VOL> cluster.data-self-heal-algorithm full
> 
> As far as sharding is concerned, some critical caching issues were fixed in
> 3.7.7 and 3.7.8.
> And my guess is that the vm crash/unbootable state could be because of this
> issue, which exists in 3.7.6.
> 
> 3.7.10 saw the introduction of throttled client side heals which also moves
> such heals to the background, which is all the more helpful for preventing
> starvation of vms during client heal.
> 
> Considering these factors, I think it would be better if you upgraded your
> machines to 3.7.10.
> 
> Do let me know if migrating to 3.7.10 solves your issues.
> 
> -Krutika
> 
> On Mon, Apr 18, 2016 at 12:40 PM, Kevin Lemonnier <lemonnierk@xxxxxxxxx>
> wrote:
> 
> > Yes, but as I was saying I don't believe KVM is using a mount point, I
> > think it uses
> > the API (
> > http://www.gluster.org/community/documentation/index.php/Libgfapi_with_qemu_libvirt
> > ).
> > Might be mistaken ofcourse. Proxmox does have a mountpoint for
> > conveniance, I'll attach those
> > logs, hoping they contain the informations you need. They do seem to
> > contain a lot of errors
> > for the 15.
> > For reference, there was a disconnect of the first brick (10.10.0.1) in
> > the morning and then a successfull
> > heal that caused about 40 minutes downtime of the VMs. Right after that
> > heal finished (if my memory is
> > correct it was about noon or close) the second brick (10.10.0.2) rebooted,
> > and that's the one I disconnected
> > to prevent the heal from causing another downtime.
> > I reconnected it one at the end of the afternoon, hoping the heal would go
> > well but everything went down
> > like in the morning so I disconnected it again, and waited 11pm (23:00) to
> > reconnect it and let it finish.
> >
> > Thanks for your help,
> >
> >
> > On Mon, Apr 18, 2016 at 12:28:28PM +0530, Krutika Dhananjay wrote:
> > > Sorry, I was referring to the glusterfs client logs.
> > >
> > > Assuming you are using FUSE mount, your log file will be in
> > > /var/log/glusterfs/<hyphenated-mount-point-path>.log
> > >
> > > -Krutika
> > >
> > > On Sun, Apr 17, 2016 at 9:37 PM, Kevin Lemonnier <lemonnierk@xxxxxxxxx>
> > > wrote:
> > >
> > > > I believe Proxmox is just an interface to KVM that uses the lib, so if
> > I'm
> > > > not mistaken there isn't client logs ?
> > > >
> > > > It's not the first time I have the issue, it happens on every heal on
> > the
> > > > 2 clusters I have.
> > > >
> > > > I did let the heal finish that night and the VMs are working now, but
> > it
> > > > is pretty scarry for future crashes or brick replacement.
> > > > Should I maybe lower the shard size ? Won't solve the fact that 2
> > bricks
> > > > on 3 aren't keeping the filesystem usable but might make the healing
> > > > quicker right ?
> > > >
> > > > Thanks
> > > >
> > > > Le 17 avril 2016 17:56:37 GMT+02:00, Krutika Dhananjay <
> > > > kdhananj@xxxxxxxxxx> a écrit :
> > > > >Could you share the client logs and information about the approx
> > > > >time/day
> > > > >when you saw this issue?
> > > > >
> > > > >-Krutika
> > > > >
> > > > >On Sat, Apr 16, 2016 at 12:57 AM, Kevin Lemonnier
> > > > ><lemonnierk@xxxxxxxxx>
> > > > >wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> We have a small glusterFS 3.7.6 cluster with 3 nodes running with
> > > > >proxmox
> > > > >> VM's on it. I did set up the different recommended option like the
> > > > >virt
> > > > >> group, but
> > > > >> by hand since it's on debian. The shards are 256MB, if that matters.
> > > > >>
> > > > >> This morning the second node crashed, and as it came back up started
> > > > >a
> > > > >> heal, but that basically froze all the VM's running on that volume.
> > > > >Since
> > > > >> we really really
> > > > >> can't have 40 minutes down time in the middle of the day, I just
> > > > >removed
> > > > >> the node from the network and that stopped the heal, allowing the
> > > > >VM's to
> > > > >> access
> > > > >> their disks again. The plan was to re-connecte the node in a couple
> > > > >of
> > > > >> hours to let it heal at night.
> > > > >> But a VM crashed now, and it can't boot up again : seems to freez
> > > > >trying
> > > > >> to access the disks.
> > > > >>
> > > > >> Looking at the heal info for the volume, it has gone way up since
> > > > >this
> > > > >> morning, it looks like the VM's aren't writing to both nodes, just
> > > > >the one
> > > > >> they are on.
> > > > >> It seems pretty bad, we have 2 nodes on 3 up, I would expect the
> > > > >volume to
> > > > >> work just fine since it has quorum. What am I missing ?
> > > > >>
> > > > >> It is still too early to start the heal, is there a way to start the
> > > > >VM
> > > > >> anyway right now ? I mean, it was running a moment ago so the data
> > is
> > > > >> there, it just needs
> > > > >> to let the VM access it.
> > > > >>
> > > > >>
> > > > >>
> > > > >> Volume Name: vm-storage
> > > > >> Type: Replicate
> > > > >> Volume ID: a5b19324-f032-4136-aaac-5e9a4c88aaef
> > > > >> Status: Started
> > > > >> Number of Bricks: 1 x 3 = 3
> > > > >> Transport-type: tcp
> > > > >> Bricks:
> > > > >> Brick1: first_node:/mnt/vg1-storage
> > > > >> Brick2: second_node:/mnt/vg1-storage
> > > > >> Brick3: third_node:/mnt/vg1-storage
> > > > >> Options Reconfigured:
> > > > >> cluster.quorum-type: auto
> > > > >> cluster.server-quorum-type: server
> > > > >> network.remote-dio: enable
> > > > >> cluster.eager-lock: enable
> > > > >> performance.readdir-ahead: on
> > > > >> performance.quick-read: off
> > > > >> performance.read-ahead: off
> > > > >> performance.io-cache: off
> > > > >> performance.stat-prefetch: off
> > > > >> features.shard: on
> > > > >> features.shard-block-size: 256MB
> > > > >> cluster.server-quorum-ratio: 51%
> > > > >>
> > > > >>
> > > > >> Thanks for your help
> > > > >>
> > > > >> --
> > > > >> Kevin Lemonnier
> > > > >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> > > > >>
> > > > >> _______________________________________________
> > > > >> Gluster-users mailing list
> > > > >> Gluster-users@xxxxxxxxxxx
> > > > >> http://www.gluster.org/mailman/listinfo/gluster-users
> > > > >>
> > > >
> > > > --
> > > > Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma
> > brièveté.
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-users@xxxxxxxxxxx
> > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > >
> >
> > --
> > Kevin Lemonnier
> > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Attachment:
signature.asc

Description: Digital signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users