Re: VM disks corruption on 3.7.11

Kevin Lemonnier <lemonnierk@xxxxxxxxx> · Wed, 18 May 2016 16:06:44 +0200

Some additional details if it helps, there is no cache on the disk,
it's virtio and iothread=1. The file is in qcow and using qemu-img check
it says it's not corrupted, but when the VM is running I have I/O Errors.
As you can see in the config, performance.stat-prefetch: off but being
on a debian system I don't have the virt group, I just tried to replicate
the different settings by hand. Maybe I forgot something.

Thanks !

On Wed, May 18, 2016 at 07:11:08PM +0530, Krutika Dhananjay wrote:
> Hi,
> 
> I will try to recreate this issue tomorrow on my machines with the steps
> that Lindsay provided in this thread. I will let you know the result soon
> after that.
> 
> -Krutika
> 
> On Wednesday, May 18, 2016, Kevin Lemonnier <lemonnierk@xxxxxxxxx> wrote:
> > Hi,
> >
> > Some news on this.
> > Over the week end the RAID Card of the node ipvr2 died, and I thought
> > that maybe that was the problem all along. The RAID Card was changed
> > and yesterday I reinstalled everything.
> > Same problem just now.
> >
> > My test is simple, using the website hosted on the VMs all the time
> > I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
> > ipvr2 then reboot it, wait for the heal to complete then migrate all
> > the VMs off ipvr3 then reboot it.
> > Everytime the first database VM (which is the only one really using the
> disk
> > durign the heal) starts showing I/O errors on it's disk.
> >
> > Am I really the only one with that problem ?
> > Maybe one of the drives is dying too, who knows, but SMART isn't saying
> anything ..
> >
> >
> > On Thu, May 12, 2016 at 04:03:02PM +0200, Kevin Lemonnier wrote:
> >> Hi,
> >>
> >> I had a problem some time ago with 3.7.6 and freezing during heals,
> >> and multiple persons advised to use 3.7.11 instead. Indeed, with that
> >> version the freez problem is fixed, it works like a dream ! You can
> >> almost not tell that a node is down or healing, everything keeps working
> >> except for a little freez when the node just went down and I assume
> >> hasn't timed out yet, but that's fine.
> >>
> >> Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are proxmox
> >> VMs with qCow2 disks stored on the gluster volume.
> >> Here is the config :
> >>
> >> Volume Name: gluster
> >> Type: Replicate
> >> Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
> >> Status: Started
> >> Number of Bricks: 1 x 3 = 3
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: ipvr2.client:/mnt/storage/gluster
> >> Brick2: ipvr3.client:/mnt/storage/gluster
> >> Brick3: ipvr50.client:/mnt/storage/gluster
> >> Options Reconfigured:
> >> cluster.quorum-type: auto
> >> cluster.server-quorum-type: server
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.quick-read: off
> >> performance.read-ahead: off
> >> performance.io-cache: off
> >> performance.stat-prefetch: off
> >> features.shard: on
> >> features.shard-block-size: 64MB
> >> cluster.data-self-heal-algorithm: full
> >> performance.readdir-ahead: on
> >>
> >>
> >> As mentioned, I rebooted one of the nodes to test the freezing issue I
> had
> >> on previous versions and appart from the initial timeout, nothing, the
> website
> >> hosted on the VMs keeps working like a charm even during heal.
> >> Since it's testing, there isn't any load on it though, and I just tried
> to refresh
> >> the database by importing the production one on the two MySQL VMs, and
> both of them
> >> started doing I/O errors. I tried shutting them down and powering them
> on again,
> >> but same thing, even starting full heals by hand doesn't solve the
> problem, the disks are
> >> corrupted. They still work, but sometimes they remount their partitions
> read only ..
> >>
> >> I believe there is a few people already using 3.7.11, no one noticed
> corruption problems ?
> >> Anyone using Proxmox ? As already mentionned in multiple other threads
> on this mailing list
> >> by other users, I also have pretty much always shards in heal info, but
> nothing "stuck" there,
> >> they always go away in a few seconds getting replaced by other shards.
> >>
> >> Thanks
> >>
> >> --
> >> Kevin Lemonnier
> >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >
> >
> >
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users@xxxxxxxxxxx
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
> > --
> > Kevin Lemonnier
> > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Attachment:
signature.asc

Description: Digital signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users