Re: Gluster Startup Issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So I've tried using a lot of your script, but I'm still unable to get past the "Launching heal operation to perform full self heal on volume <volname> has been unsuccessful on bricks that are down. Please check if all brick processes are running." error message.  Everything else seems like it's working, but the "gluster volume heal appian full" is never working.

Is there any way to figure out what exactly happened that would cause this error message?  The logs don't seem very useful in determining what exactly happened.  It seems to just state that it can't seem to "Commit" with the other bricks.

When I restart the volume though, it sometimes fixes it, but not sure I want to run a script that constantly restarts the volume until "gluster volume heal appian full" is working.

On Thu, Jun 23, 2016 at 2:21 AM, Heiko L. <heikol@xxxxxxxxxxxxx> wrote:

hostname not needed

# nodea=10.1.1.100;bricka=/mnt/sda6/brick4
should be working

but I prefer like to work with hostnames.


regards heiko

PS i forgot notes:
- xfs,zfs (ext3 work, but partially bad performance (V3.4))
- brickdir should not be topdir of fs
  /dev/sda6 /mnt/brick4, brick=/mnt/brick4 ->  not recommended
  /dev/sda6 /mnt/sda6,   brick=/mnt/sda6/brick4     better

> Thank you for responding, Heiko.  In the process of seeing the differences
> between our two scripts.  First thing I noticed was that the notes states "need
> to be defined in the /etc/hosts". Would using the IP address directly be a
> problem?
>
> On Tue, Jun 21, 2016 at 2:10 PM, Heiko L. <heikol@xxxxxxxxxxxxx> wrote:
>
>> Am Di, 21.06.2016, 19:22 schrieb Danny Lee:
>> > Hello,
>> >
>> >
>> > We are currently figuring out how to add GlusterFS to our system to make
>> > our systems highly available using scripts.  We are using Gluster 3.7.11.
>> >
>> > Problem:
>> > Trying to migrate to GlusterFS from a non-clustered system to a 3-node
>> > glusterfs replicated cluster using scripts.  Tried various things to
>> make this work, but it sometimes causes us to be in an
>> > indesirable state where if you call "gluster volume heal <volname>
>> full", we would get the error message, "Launching heal
>> > operation to perform full self heal on volume <volname> has been
>> unsuccessful on bricks that are down. Please check if
>> > all brick processes are running."  All the brick processes are running
>> based on running the command, "gluster volume status
>> > volname"
>> >
>> > Things we have tried:
>> > Order of preference
>> > 1. Create Volume with 3 Filesystems with the same data
>> > 2. Create Volume with 2 Empty filesysytems and one with the data
>> > 3. Create Volume with only one filesystem with data and then using
>> > "add-brick" command to add the other two empty filesystems
>> > 4. Create Volume with one empty filesystem, mounting it, and then copying
>> > the data over to that one.  And then finally, using "add-brick" command
>> to add the other two empty filesystems
>> - should be working
>> - read each file on /mnt/gvol, to trigger replication [2]
>>
>> > 5. Create Volume
>> > with 3 empty filesystems, mounting it, and then copying the data over
>> - my favorite
>>
>> >
>> > Other things to note:
>> > A few minutes after the volume is created and started successfully, our
>> > application server starts up against it, so reads and writes may happen
>> pretty quickly after the volume has started.  But there
>> > is only about 50MB of data.
>> >
>> > Steps to reproduce (all in a script):
>> > # This is run by the primary node with the IP Adress, <server-ip-1>, that
>> > has data systemctl restart glusterd gluster peer probe <server-ip-2>
>> gluster peer probe <server-ip-3> Wait for "gluster peer
>> > status" to all be in "Peer in Cluster" state gluster volume create
>> <volname> replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]}
>> > ${BRICKS[2]} force
>> > gluster volume set <volname> nfs.disable true gluster volume start
>> <volname> mkdir -p $MOUNT_POINT mount -t glusterfs
>> > <server-ip-1>:/volname $MOUNT_POINT
>> > find $MOUNT_POINT | xargs stat
>>
>> I have written a script for 2 nodes. [1]
>> but should be at least 3 nodes.
>>
>>
>> I hope it helps you
>> regards Heiko
>>
>> >
>> > Note that, when we added sleeps around the gluster commands, there was a
>> > higher probability of success, but not 100%.
>> >
>> > # Once volume is started, all the the clients/servers will mount the
>> > gluster filesystem by polling "mountpoint -q $MOUNT_POINT": mkdir -p
>> $MOUNT_POINT mount -t glusterfs <server-ip-1>:/volname
>> > $MOUNT_POINT
>> >
>> >
>> > Logs:
>> > *etc-glusterfs-glusterd.vol.log* in *server-ip-1*
>> >
>> >
>> > [2016-06-21 14:10:38.285234] I [MSGID: 106533]
>> > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume]
>> 0-management:
>> > Received heal vol req for volume volname
>> > [2016-06-21 14:10:38.296801] E [MSGID: 106153]
>> > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
>> > <server-ip-2>. Please check log file for details.
>> >
>> >
>> >
>> > *usr-local-volname-data-mirrored-data.log* in *server-ip-1*
>> >
>> >
>> > [2016-06-21 14:14:39.233366] E [MSGID: 114058]
>> > [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:
>> > failed to get the port number for remote subvolume. Please run 'gluster
>> volume status' on server to see if brick process is
>> > running. *I think this is caused by the self heal daemon*
>> >
>> >
>> > *cmd_history.log* in *server-ip-1*
>> >
>> >
>> > [2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED :
>> Commit
>> > failed on <server-ip-2>. Please check log file for details.
>> _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users@xxxxxxxxxxx
>> > http://www.gluster.org/mailman/listinfo/gluster-users
>>
>> [1]
>> http://www2.fh-lausitz.de/launic/comp/net/glusterfs/130620.glusterfs.create_brick_vol.howto.txt
>>   - old, limit 2 nodes
>>
>>
>> --
>>
>>
>>
>



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux