So the VM were configured with cache set to none, I just tried with cache=directsync and it seems to be fixing the issue. Still need to run more test, but did a couple already with that option and no I/O errors. Never had to do this before, is it known ? Found the clue in some old mail from this mailing list, did I miss some doc saying you should be using directsync with glusterfs ? On Tue, May 24, 2016 at 11:33:28AM +0200, Kevin Lemonnier wrote: > Hi, > > Some news on this. > I actually don't need to trigger a heal to get corruption, so the problem > is not the healing. Live migrating the VM seems to trigger corruption every > time, and even without that just doing a database import, rebooting then > doing another import seems to corrupt as well. > > To check I created local storages on each node on the same partition as the > gluster bricks, on XFS, and moved the VM disk on each local storage and tested > the same procedure one by one, no corruption. It seems to happen only on > glusterFS, so I'm not so sure it's hardware anymore : if it was using local storage > would corrupt too, right ? > Could I be missing some critical configuration for VM storage on my gluster volume ? > > > On Mon, May 23, 2016 at 01:54:30PM +0200, Kevin Lemonnier wrote: > > Hi, > > > > I didn't specify it but I use "localhost" to add the storage in proxmox. > > My thinking is that every proxmox node is also a glusterFS node, so that > > should work fine. > > > > I don't want to use the "normal" way of setting a regular address in there > > because you can't change it afterwards in proxmox, but could that be the source of > > the problem, maybe during livre migration there is write comming from > > two different servers at the same time ? > > > > > > > > On Wed, May 18, 2016 at 07:11:08PM +0530, Krutika Dhananjay wrote: > > > Hi, > > > > > > I will try to recreate this issue tomorrow on my machines with the steps > > > that Lindsay provided in this thread. I will let you know the result soon > > > after that. > > > > > > -Krutika > > > > > > On Wednesday, May 18, 2016, Kevin Lemonnier <lemonnierk@xxxxxxxxx> wrote: > > > > Hi, > > > > > > > > Some news on this. > > > > Over the week end the RAID Card of the node ipvr2 died, and I thought > > > > that maybe that was the problem all along. The RAID Card was changed > > > > and yesterday I reinstalled everything. > > > > Same problem just now. > > > > > > > > My test is simple, using the website hosted on the VMs all the time > > > > I reboot ipvr50, wait for the heal to complete, migrate all the VMs off > > > > ipvr2 then reboot it, wait for the heal to complete then migrate all > > > > the VMs off ipvr3 then reboot it. > > > > Everytime the first database VM (which is the only one really using the > > > disk > > > > durign the heal) starts showing I/O errors on it's disk. > > > > > > > > Am I really the only one with that problem ? > > > > Maybe one of the drives is dying too, who knows, but SMART isn't saying > > > anything .. > > > > > > > > > > > > On Thu, May 12, 2016 at 04:03:02PM +0200, Kevin Lemonnier wrote: > > > >> Hi, > > > >> > > > >> I had a problem some time ago with 3.7.6 and freezing during heals, > > > >> and multiple persons advised to use 3.7.11 instead. Indeed, with that > > > >> version the freez problem is fixed, it works like a dream ! You can > > > >> almost not tell that a node is down or healing, everything keeps > > > working > > > >> except for a little freez when the node just went down and I assume > > > >> hasn't timed out yet, but that's fine. > > > >> > > > >> Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are > > > proxmox > > > >> VMs with qCow2 disks stored on the gluster volume. > > > >> Here is the config : > > > >> > > > >> Volume Name: gluster > > > >> Type: Replicate > > > >> Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a > > > >> Status: Started > > > >> Number of Bricks: 1 x 3 = 3 > > > >> Transport-type: tcp > > > >> Bricks: > > > >> Brick1: ipvr2.client:/mnt/storage/gluster > > > >> Brick2: ipvr3.client:/mnt/storage/gluster > > > >> Brick3: ipvr50.client:/mnt/storage/gluster > > > >> Options Reconfigured: > > > >> cluster.quorum-type: auto > > > >> cluster.server-quorum-type: server > > > >> network.remote-dio: enable > > > >> cluster.eager-lock: enable > > > >> performance.quick-read: off > > > >> performance.read-ahead: off > > > >> performance.io-cache: off > > > >> performance.stat-prefetch: off > > > >> features.shard: on > > > >> features.shard-block-size: 64MB > > > >> cluster.data-self-heal-algorithm: full > > > >> performance.readdir-ahead: on > > > >> > > > >> > > > >> As mentioned, I rebooted one of the nodes to test the freezing issue I > > > had > > > >> on previous versions and appart from the initial timeout, nothing, the > > > website > > > >> hosted on the VMs keeps working like a charm even during heal. > > > >> Since it's testing, there isn't any load on it though, and I just tried > > > to refresh > > > >> the database by importing the production one on the two MySQL VMs, and > > > both of them > > > >> started doing I/O errors. I tried shutting them down and powering them > > > on again, > > > >> but same thing, even starting full heals by hand doesn't solve the > > > problem, the disks are > > > >> corrupted. They still work, but sometimes they remount their partitions > > > read only .. > > > >> > > > >> I believe there is a few people already using 3.7.11, no one noticed > > > corruption problems ? > > > >> Anyone using Proxmox ? As already mentionned in multiple other threads > > > on this mailing list > > > >> by other users, I also have pretty much always shards in heal info, but > > > nothing "stuck" there, > > > >> they always go away in a few seconds getting replaced by other shards. > > > >> > > > >> Thanks > > > >> > > > >> -- > > > >> Kevin Lemonnier > > > >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > > > > > > > > > > > > > > >> _______________________________________________ > > > >> Gluster-users mailing list > > > >> Gluster-users@xxxxxxxxxxx > > > >> http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > -- > > > > Kevin Lemonnier > > > > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > > > > > > > -- > > Kevin Lemonnier > > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@xxxxxxxxxxx > > http://www.gluster.org/mailman/listinfo/gluster-users > > > -- > Kevin Lemonnier > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users