Yes, I did it only if I have the previous result of heal info ("number of entries: 0"). But same result, as soon as the second Node is offline (after they were both working/back online), everything is corrupted. To recap: * Node 1 UP Node 2 UP -> OK * Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see the path down and change if necessary) * Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed in heal command) * Node 1 DOWN Node 2 UP -> NOT OK (data corruption) On Fri, Nov 18, 2016 at 3:39 PM, David Gossage <dgossage@xxxxxxxxxxxxxxxxxx> wrote: > On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert <lambert.olivier@xxxxxxxxx> > wrote: >> >> Hi David, >> >> What are the exact commands to be sure it's fine? >> >> Right now I got: >> >> # gluster volume heal gv0 info >> Brick 10.0.0.1:/bricks/brick1/gv0 >> Status: Connected >> Number of entries: 0 >> >> Brick 10.0.0.2:/bricks/brick1/gv0 >> Status: Connected >> Number of entries: 0 >> >> Brick 10.0.0.3:/bricks/brick1/gv0 >> Status: Connected >> Number of entries: 0 >> >> > Did you run this before taking down 2nd node to see if any heals were > ongoing? > > Also I see you have sharding enabled. Are your files being served sharded > already as well? > >> >> Everything is online and working, but this command give a strange output: >> >> # gluster volume heal gv0 info heal-failed >> Gathering list of heal failed entries on volume gv0 has been >> unsuccessful on bricks that are down. Please check if all brick >> processes are running. >> >> Is it normal? > > > I don't think that is a valid command anymore as whern I run it I get same > message and this is in logs > [2016-11-18 14:35:02.260503] I [MSGID: 106533] > [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management: > Received heal vol req for volume GLUSTER1 > [2016-11-18 14:35:02.263341] W [MSGID: 106530] > [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command > not supported. Please use "gluster volume heal GLUSTER1 info" and logs to > find the heal information. > [2016-11-18 14:35:02.263365] E [MSGID: 106301] > [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of > operation 'Volume Heal' failed on localhost : Command not supported. Please > use "gluster volume heal GLUSTER1 info" and logs to find the heal > information. > >> >> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage >> <dgossage@xxxxxxxxxxxxxxxxxx> wrote: >> > >> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert >> > <lambert.olivier@xxxxxxxxx> >> > wrote: >> >> >> >> Okay, used the exact same config you provided, and adding an arbiter >> >> node (node3) >> >> >> >> After halting node2, VM continues to work after a small "lag"/freeze. >> >> I restarted node2 and it was back online: OK >> >> >> >> Then, after waiting few minutes, halting node1. And **just** at this >> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty >> >> etc.) >> >> >> > Other than waiting a few minutes did you make sure heals had completed? >> > >> >> >> >> dmesg of the VM: >> >> >> >> [ 1645.852905] EXT4-fs error (device xvda1): >> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad >> >> entry in directory: rec_len is smaller than minimal - offset=0(0), >> >> inode=0, rec_len=0, name_len=0 >> >> [ 1645.854509] Aborting journal on device xvda1-8. >> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only >> >> >> >> And got a lot of " comm bash: bad entry in directory" messages then... >> >> >> >> Here is the current config with all Node back online: >> >> >> >> # gluster volume info >> >> >> >> Volume Name: gv0 >> >> Type: Replicate >> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x (2 + 1) = 3 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> >> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) >> >> Options Reconfigured: >> >> nfs.disable: on >> >> performance.readdir-ahead: on >> >> transport.address-family: inet >> >> features.shard: on >> >> features.shard-block-size: 16MB >> >> network.remote-dio: enable >> >> cluster.eager-lock: enable >> >> performance.io-cache: off >> >> performance.read-ahead: off >> >> performance.quick-read: off >> >> performance.stat-prefetch: on >> >> performance.strict-write-ordering: off >> >> cluster.server-quorum-type: server >> >> cluster.quorum-type: auto >> >> cluster.data-self-heal: on >> >> >> >> >> >> # gluster volume status >> >> Status of volume: gv0 >> >> Gluster process TCP Port RDMA Port Online >> >> Pid >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y >> >> 1331 >> >> Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y >> >> 2274 >> >> Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y >> >> 2355 >> >> Self-heal Daemon on localhost N/A N/A Y >> >> 2300 >> >> Self-heal Daemon on 10.0.0.3 N/A N/A Y >> >> 10530 >> >> Self-heal Daemon on 10.0.0.2 N/A N/A Y >> >> 2425 >> >> >> >> Task Status of Volume gv0 >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> There are no active volume tasks >> >> >> >> >> >> >> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert >> >> <lambert.olivier@xxxxxxxxx> wrote: >> >> > It's planned to have an arbiter soon :) It was just preliminary >> >> > tests. >> >> > >> >> > Thanks for the settings, I'll test this soon and I'll come back to >> >> > you! >> >> > >> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson >> >> > <lindsay.mathieson@xxxxxxxxx> wrote: >> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: >> >> >>> >> >> >>> gluster volume info gv0 >> >> >>> >> >> >>> Volume Name: gv0 >> >> >>> Type: Replicate >> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 >> >> >>> Status: Started >> >> >>> Snapshot Count: 0 >> >> >>> Number of Bricks: 1 x 2 = 2 >> >> >>> Transport-type: tcp >> >> >>> Bricks: >> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> >> >>> Options Reconfigured: >> >> >>> nfs.disable: on >> >> >>> performance.readdir-ahead: on >> >> >>> transport.address-family: inet >> >> >>> features.shard: on >> >> >>> features.shard-block-size: 16MB >> >> >> >> >> >> >> >> >> >> >> >> When hosting VM's its essential to set these options: >> >> >> >> >> >> network.remote-dio: enable >> >> >> cluster.eager-lock: enable >> >> >> performance.io-cache: off >> >> >> performance.read-ahead: off >> >> >> performance.quick-read: off >> >> >> performance.stat-prefetch: on >> >> >> performance.strict-write-ordering: off >> >> >> cluster.server-quorum-type: server >> >> >> cluster.quorum-type: auto >> >> >> cluster.data-self-heal: on >> >> >> >> >> >> Also with replica two and quorum on (required) your volume will >> >> >> become >> >> >> read-only when one node goes down to prevent the possibility of >> >> >> split-brain >> >> >> - you *really* want to avoid that :) >> >> >> >> >> >> I'd recommend a replica 3 volume, that way 1 node can go down, but >> >> >> the >> >> >> other >> >> >> two still form a quorum and will remain r/w. >> >> >> >> >> >> If the extra disks are not possible, then a Arbiter volume can be >> >> >> setup >> >> >> - >> >> >> basically dummy files on the third node. >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Lindsay Mathieson >> >> >> >> >> >> _______________________________________________ >> >> >> Gluster-users mailing list >> >> >> Gluster-users@xxxxxxxxxxx >> >> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users@xxxxxxxxxxx >> >> http://www.gluster.org/mailman/listinfo/gluster-users >> > >> > > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users