Re: corruption using gluster and iSCSI with LIO

Olivier Lambert <lambert.olivier@xxxxxxxxx> · Fri, 18 Nov 2016 10:49:51 +0100

Hi David,

What are the exact commands to be sure it's fine?

Right now I got:

# gluster volume heal gv0 info
Brick 10.0.0.1:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.2:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.3:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Everything is online and working, but this command give a strange output:

# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.

Is it normal?

On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
<dgossage@xxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert <lambert.olivier@xxxxxxxxx>
> wrote:
>>
>> Okay, used the exact same config you provided, and adding an arbiter
>> node (node3)
>>
>> After halting node2, VM continues to work after a small "lag"/freeze.
>> I restarted node2 and it was back online: OK
>>
>> Then, after waiting few minutes, halting node1. And **just** at this
>> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> etc.)
>>
> Other than waiting a few minutes did you make sure heals had completed?
>
>>
>> dmesg of the VM:
>>
>> [ 1645.852905] EXT4-fs error (device xvda1):
>> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> inode=0, rec_len=0, name_len=0
>> [ 1645.854509] Aborting journal on device xvda1-8.
>> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>>
>> And got a lot of " comm bash: bad entry in directory" messages then...
>>
>> Here is the current config with all Node back online:
>>
>> # gluster volume info
>>
>> Volume Name: gv0
>> Type: Replicate
>> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> features.shard: on
>> features.shard-block-size: 16MB
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.stat-prefetch: on
>> performance.strict-write-ordering: off
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.data-self-heal: on
>>
>>
>> # gluster volume status
>> Status of volume: gv0
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>> ------------------------------------------------------------------------------
>> Brick 10.0.0.1:/bricks/brick1/gv0           49152     0          Y
>> 1331
>> Brick 10.0.0.2:/bricks/brick1/gv0           49152     0          Y
>> 2274
>> Brick 10.0.0.3:/bricks/brick1/gv0           49152     0          Y
>> 2355
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 2300
>> Self-heal Daemon on 10.0.0.3                N/A       N/A        Y
>> 10530
>> Self-heal Daemon on 10.0.0.2                N/A       N/A        Y
>> 2425
>>
>> Task Status of Volume gv0
>>
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>>
>> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>> <lambert.olivier@xxxxxxxxx> wrote:
>> > It's planned to have an arbiter soon :) It was just preliminary tests.
>> >
>> > Thanks for the settings, I'll test this soon and I'll come back to you!
>> >
>> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>> > <lindsay.mathieson@xxxxxxxxx> wrote:
>> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>> >>>
>> >>> gluster volume info gv0
>> >>>
>> >>> Volume Name: gv0
>> >>> Type: Replicate
>> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> >>> Status: Started
>> >>> Snapshot Count: 0
>> >>> Number of Bricks: 1 x 2 = 2
>> >>> Transport-type: tcp
>> >>> Bricks:
>> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >>> Options Reconfigured:
>> >>> nfs.disable: on
>> >>> performance.readdir-ahead: on
>> >>> transport.address-family: inet
>> >>> features.shard: on
>> >>> features.shard-block-size: 16MB
>> >>
>> >>
>> >>
>> >> When hosting VM's its essential to set these options:
>> >>
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >> Also with replica two and quorum on (required) your volume will become
>> >> read-only when one node goes down to prevent the possibility of
>> >> split-brain
>> >> - you *really* want to avoid that :)
>> >>
>> >> I'd recommend a replica 3 volume, that way 1 node can go down, but the
>> >> other
>> >> two still form a quorum and will remain r/w.
>> >>
>> >> If the extra disks are not possible, then a Arbiter volume can be setup
>> >> -
>> >> basically dummy files on the third node.
>> >>
>> >>
>> >>
>> >> --
>> >> Lindsay Mathieson
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users@xxxxxxxxxxx
>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users