Re: Some bricks are offline after restart, how to bring them online gracefully?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jan,

comments inline.

On Fri, Jun 30, 2017 at 1:31 AM, Jan <jan.h.zak@xxxxxxxxx> wrote:
> Hi all,
>
> Gluster and Ganesha are amazing. Thank you for this great work!
>
> I’m struggling with one issue and I think that you might be able to help me.
>
> I spent some time by playing with Gluster and Ganesha and after I gain some
> experience I decided that I should go into production but I’m still
> struggling with one issue.
>
> I have 3x node CentOS 7.3 with the most current Gluster and Ganesha from
> centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.
>
> Servers have a lot of resources and they run in a subnet on a stable
> network.
>
> I didn’t have any issues when I tested a single brick. But now I’d like to
> setup 17 replicated bricks and I realized that when I restart one of nodes
> then the result looks like this:
>
> sudo gluster volume status | grep ' N '
>
> Brick glunode0:/st/brick3/dir          N/A       N/A        N       N/A
> Brick glunode1:/st/brick2/dir          N/A       N/A        N       N/A
>

did you try it multiple times?

> Some bricks just don’t go online. Sometime it’s one brick, sometime tree and
> it’s not same brick – it’s random issue.
>
> I checked log on affected servers and this is an example:
>
> sudo tail /var/log/glusterfs/bricks/st-brick3-0.log
>
> [2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] 0-glusterfs:
> readv on 10.2.44.23:24007 failed (No data available)
> [2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_rpc_notify]
> 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No data
> available)
> [2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_rpc_notify]
> 0-glusterfsd-mgmt: Exhausted all volfile servers
> [2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f3158032dc5]
> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5]
> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f31596cbdfb] )
> 0-:received signum (15), shutting down
> [2017-06-29 17:59:50.397107] E [socket.c:3203:socket_connect] 0-glusterfs:
> connection attempt on 10.2.44.23:24007 failed, (Network is unreachable)
> [2017-06-29 17:59:50.397138] I [socket.c:3507:socket_submit_request]
> 0-glusterfs: not connected (priv->connected = 0)
> [2017-06-29 17:59:50.397162] W [rpc-clnt.c:1693:rpc_clnt_submit]
> 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: Gluster
> Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)
>
> I think that important message is “Network is unreachable”.
>
> Question
> 1. Could you please tell me, is that normal when you have many bricks?
> Networks is definitely stable and other servers use it without problem and
> all servers run on a same pair of switches. My assumption is that in the
> same time many bricks try to connect and that doesn’t work.

no. it shouldnt happen if there are multiple bricks.
there was a bug related to this [1]
to verify if that was the issue I need to know a few things.
1) are all the node of the same version.
2) did you check grepping for the brick process using the ps command?
need to verify is the brick is still up and is not connected to glusterd alone.


>
> 2. Is there an option to configure a brick to enable some kind of
> autoreconnect or add some timeout?
> gluster volume set brick123 option456 abc ??
If the brick process is not seen in the ps aux | grep glusterfsd
The way to start a brick is to use the volume start force command.
If brick is not started there is no point configuring it. and to start
a brick we cant
use the configure command.

>
> 3. What it the recommend way to fix offline brick on the affected server? I
> don’t want to use “gluster volume stop/start” since affected bricks are
> online on other server and there is no reason to completely turn it off.
gluster volume start force will not bring down the bricks that are
already up and
running.

>
> Thank you,
> Jan
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Regards,
Hari Gowtham.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux