Hi David, What are the exact commands to be sure it's fine? Right now I got: # gluster volume heal gv0 info Brick 10.0.0.1:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Brick 10.0.0.2:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Brick 10.0.0.3:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Everything is online and working, but this command give a strange output: # gluster volume heal gv0 info heal-failed Gathering list of heal failed entries on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running. Is it normal? On Fri, Nov 18, 2016 at 2:51 AM, David Gossage <dgossage@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert <lambert.olivier@xxxxxxxxx> > wrote: >> >> Okay, used the exact same config you provided, and adding an arbiter >> node (node3) >> >> After halting node2, VM continues to work after a small "lag"/freeze. >> I restarted node2 and it was back online: OK >> >> Then, after waiting few minutes, halting node1. And **just** at this >> moment, the VM is corrupted (segmentation fault, /var/log folder empty >> etc.) >> > Other than waiting a few minutes did you make sure heals had completed? > >> >> dmesg of the VM: >> >> [ 1645.852905] EXT4-fs error (device xvda1): >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad >> entry in directory: rec_len is smaller than minimal - offset=0(0), >> inode=0, rec_len=0, name_len=0 >> [ 1645.854509] Aborting journal on device xvda1-8. >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only >> >> And got a lot of " comm bash: bad entry in directory" messages then... >> >> Here is the current config with all Node back online: >> >> # gluster volume info >> >> Volume Name: gv0 >> Type: Replicate >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) >> Options Reconfigured: >> nfs.disable: on >> performance.readdir-ahead: on >> transport.address-family: inet >> features.shard: on >> features.shard-block-size: 16MB >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> performance.stat-prefetch: on >> performance.strict-write-ordering: off >> cluster.server-quorum-type: server >> cluster.quorum-type: auto >> cluster.data-self-heal: on >> >> >> # gluster volume status >> Status of volume: gv0 >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y >> 1331 >> Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y >> 2274 >> Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y >> 2355 >> Self-heal Daemon on localhost N/A N/A Y >> 2300 >> Self-heal Daemon on 10.0.0.3 N/A N/A Y >> 10530 >> Self-heal Daemon on 10.0.0.2 N/A N/A Y >> 2425 >> >> Task Status of Volume gv0 >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert >> <lambert.olivier@xxxxxxxxx> wrote: >> > It's planned to have an arbiter soon :) It was just preliminary tests. >> > >> > Thanks for the settings, I'll test this soon and I'll come back to you! >> > >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson >> > <lindsay.mathieson@xxxxxxxxx> wrote: >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: >> >>> >> >>> gluster volume info gv0 >> >>> >> >>> Volume Name: gv0 >> >>> Type: Replicate >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 >> >>> Status: Started >> >>> Snapshot Count: 0 >> >>> Number of Bricks: 1 x 2 = 2 >> >>> Transport-type: tcp >> >>> Bricks: >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> >>> Options Reconfigured: >> >>> nfs.disable: on >> >>> performance.readdir-ahead: on >> >>> transport.address-family: inet >> >>> features.shard: on >> >>> features.shard-block-size: 16MB >> >> >> >> >> >> >> >> When hosting VM's its essential to set these options: >> >> >> >> network.remote-dio: enable >> >> cluster.eager-lock: enable >> >> performance.io-cache: off >> >> performance.read-ahead: off >> >> performance.quick-read: off >> >> performance.stat-prefetch: on >> >> performance.strict-write-ordering: off >> >> cluster.server-quorum-type: server >> >> cluster.quorum-type: auto >> >> cluster.data-self-heal: on >> >> >> >> Also with replica two and quorum on (required) your volume will become >> >> read-only when one node goes down to prevent the possibility of >> >> split-brain >> >> - you *really* want to avoid that :) >> >> >> >> I'd recommend a replica 3 volume, that way 1 node can go down, but the >> >> other >> >> two still form a quorum and will remain r/w. >> >> >> >> If the extra disks are not possible, then a Arbiter volume can be setup >> >> - >> >> basically dummy files on the third node. >> >> >> >> >> >> >> >> -- >> >> Lindsay Mathieson >> >> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users@xxxxxxxxxxx >> >> http://www.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://www.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users