Re: Issue in Adding/Removing the gluster node

Gaurav Garg <ggarg@xxxxxxxxxx> · Mon, 22 Feb 2016 04:12:50 -0500 (EST)

Hi abhishek,

I went through your logs of node 1 and by looking glusterd logs its clearly indicate that your 2nd node (10.32.1.144) have disconnected from the cluster, because of that remove-brick operation failed. I think you need to check your network interface.

But surprising things is that i did not see duplicate peer entry in #gluster peer status command output.

May be i will get some more information from your (10.32.1.144) 2nd node logs. Could you also attach your 2nd node logs.

after restarting glusterd, are you seeing duplicate peer entry in #gluster peer status command output ? 

will wait for 2nd node logs for further analyzing duplicate peer entry problem. 

Thanks,

~Gaurav

----- Original Message -----
From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>
To: "Gaurav Garg" <ggarg@xxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Sent: Monday, February 22, 2016 12:48:55 PM
Subject: Re:  Issue in Adding/Removing the gluster node

Hi Gaurav,

Here, You can find the attached logs for the boards in case of remove-brick
failure.
In these logs we do not have the cmd_history and
etc-glusterfs-glusterd.vol.log for the second board.

May be for that we need to some more time.

Regards,
Abhishek

On Mon, Feb 22, 2016 at 10:18 AM, Gaurav Garg <ggarg@xxxxxxxxxx> wrote:

> Hi Abhishek,
>
> >>  I'll provide the required log to you.
>
> sure
>
> on both node. do "pkill glusterd" and then start glusterd services.
>
> Thanks,
>
> ~Gaurav
>
> ----- Original Message -----
> From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>
> To: "Gaurav Garg" <ggarg@xxxxxxxxxx>
> Cc: gluster-users@xxxxxxxxxxx
> Sent: Monday, February 22, 2016 10:11:48 AM
> Subject: Re:  Issue in Adding/Removing the gluster node
>
> Hi Gaurav,
>
> Thanks for your prompt reply.
>
> I'll provide the required log to you.
>
> As a workaround you suggested that restart the glusterd service. Could you
> please tell me the point where I can do this?
>
> Regards,
> Abhishek
>
> On Fri, Feb 19, 2016 at 6:11 PM, Gaurav Garg <ggarg@xxxxxxxxxx> wrote:
>
> > Hi Abhishek,
> >
> > Peer status output looks interesting where it have stale entry,
> > technically it should not happen. Here few thing need to ask
> >
> > Did you perform any manual operation with GlusterFS configuration file
> > which resides in /var/lib/glusterd/* folder.
> >
> > Can you provide output of "ls /var/lib/glusterd/peers"  from both of your
> > nodes.
> >
> > Could you provide output of #gluster peer status command when 2nd node is
> > down
> >
> > Can you provide output of #gluster volume info command
> >
> > Can you provide full logs details of cmd_history.log and
> > etc-glusterfs-glusterd.vol.log from both the nodes.
> >
> >
> > You can restart your glusterd as of now as a workaround but we need to
> > analysis this issue further.
> >
> > Thanks,
> > Gaurav
> >
> > ----- Original Message -----
> > From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>
> > To: "Gaurav Garg" <ggarg@xxxxxxxxxx>
> > Cc: gluster-users@xxxxxxxxxxx
> > Sent: Friday, February 19, 2016 5:27:21 PM
> > Subject: Re:  Issue in Adding/Removing the gluster node
> >
> > Hi Gaurav,
> >
> > After the failure of add-brick following is outcome "gluster peer status"
> > command
> >
> > Number of Peers: 2
> >
> > Hostname: 10.32.1.144
> > Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
> > State: Peer in Cluster (Connected)
> >
> > Hostname: 10.32.1.144
> > Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
> > State: Peer in Cluster (Connected)
> >
> > Regards,
> > Abhishek
> >
> > On Fri, Feb 19, 2016 at 5:21 PM, ABHISHEK PALIWAL <
> abhishpaliwal@xxxxxxxxx
> > >
> > wrote:
> >
> > > Hi Gaurav,
> > >
> > > Both are the board connect through the backplane using ethernet.
> > >
> > > Even this inconsistency also occurs when I am trying to bringing back
> the
> > > node in slot. Means some time add-brick executes without failure but
> some
> > > time following error occurs.
> > >
> > > volume add-brick c_glusterfs replica 2 10.32.1.144:
> /opt/lvmdir/c2/brick
> > > force : FAILED : Another transaction is in progress for c_glusterfs.
> > Please
> > > try again after sometime.
> > >
> > >
> > > You can also see the attached logs for add-brick failure scenario.
> > >
> > > Please let me know if you need more logs.
> > >
> > > Regards,
> > > Abhishek
> > >
> > >
> > > On Fri, Feb 19, 2016 at 5:03 PM, Gaurav Garg <ggarg@xxxxxxxxxx> wrote:
> > >
> > >> Hi Abhishek,
> > >>
> > >> How are you connecting two board, and how are you removing it manually
> > >> that need to know because if you are removing your 2nd board from the
> > >> cluster (abrupt shutdown) then you can't perform remove brick
> operation
> > in
> > >> 2nd node from first node and its happening successfully in your case.
> > could
> > >> you ensure your network connection once again while removing and
> > bringing
> > >> back your node again.
> > >>
> > >> Thanks,
> > >> Gaurav
> > >>
> > >> ------------------------------
> > >> *From: *"ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>
> > >> *To: *"Gaurav Garg" <ggarg@xxxxxxxxxx>
> > >> *Cc: *gluster-users@xxxxxxxxxxx
> > >> *Sent: *Friday, February 19, 2016 3:36:21 PM
> > >>
> > >> *Subject: *Re:  Issue in Adding/Removing the gluster
> node
> > >>
> > >> Hi Gaurav,
> > >>
> > >> Thanks for reply
> > >>
> > >> 1. Here, I removed the board manually here but this time it works fine
> > >>
> > >> [2016-02-18 10:03:40.601472]  : volume remove-brick c_glusterfs
> replica
> > 1
> > >> 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> > >> [2016-02-18 10:03:40.885973]  : peer detach 10.32.1.144 : SUCCESS
> > >>
> > >> Yes this time board is reachable but how? don't know because board is
> > >> detached.
> > >>
> > >> 2. Here, I attached the board this time its works fine in add-bricks
> > >>
> > >> 2016-02-18 10:03:42.065038]  : peer probe 10.32.1.144 : SUCCESS
> > >> [2016-02-18 10:03:44.563546]  : volume add-brick c_glusterfs replica 2
> > >> 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> > >>
> > >> 3.Here, again I removed the board this time failed occur
> > >>
> > >> [2016-02-18 10:37:02.816089]  : volume remove-brick c_glusterfs
> replica
> > 1
> > >> 10.32.1.144:/opt/lvmdir/c2/brick force : FAILED : Incorrect brick
> > >> 10.32.1.144:/opt
> > >> /lvmdir/c2/brick for volume c_glusterfs
> > >>
> > >> but here board is not reachable.
> > >>
> > >> why this inconsistency is there while doing the same step multiple
> time.
> > >>
> > >> Hope you are getting my point.
> > >>
> > >> Regards,
> > >> Abhishek
> > >>
> > >> On Fri, Feb 19, 2016 at 3:25 PM, Gaurav Garg <ggarg@xxxxxxxxxx>
> wrote:
> > >>
> > >>> Abhishek,
> > >>>
> > >>> when sometime its working fine means 2nd board network connection is
> > >>> reachable to first node. you can conform this by executing same
> > #gluster
> > >>> peer status command.
> > >>>
> > >>> Thanks,
> > >>> Gaurav
> > >>>
> > >>> ----- Original Message -----
> > >>> From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>
> > >>> To: "Gaurav Garg" <ggarg@xxxxxxxxxx>
> > >>> Cc: gluster-users@xxxxxxxxxxx
> > >>> Sent: Friday, February 19, 2016 3:12:22 PM
> > >>> Subject: Re:  Issue in Adding/Removing the gluster
> node
> > >>>
> > >>> Hi Gaurav,
> > >>>
> > >>> Yes, you are right actually I am force fully detaching the node from
> > the
> > >>> slave and when we removed the board it disconnected from the another
> > >>> board.
> > >>>
> > >>> but my question is I am doing this process multiple time some time it
> > >>> works
> > >>> fine but some time it gave these errors.
> > >>>
> > >>>
> > >>> you can see the following logs from cmd_history.log file
> > >>>
> > >>> [2016-02-18 10:03:34.497996]  : volume set c_glusterfs nfs.disable
> on :
> > >>> SUCCESS
> > >>> [2016-02-18 10:03:34.915036]  : volume start c_glusterfs force :
> > SUCCESS
> > >>> [2016-02-18 10:03:40.250326]  : volume status : SUCCESS
> > >>> [2016-02-18 10:03:40.273275]  : volume status : SUCCESS
> > >>> [2016-02-18 10:03:40.601472]  : volume remove-brick c_glusterfs
> > replica 1
> > >>> 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> > >>> [2016-02-18 10:03:40.885973]  : peer detach 10.32.1.144 : SUCCESS
> > >>> [2016-02-18 10:03:42.065038]  : peer probe 10.32.1.144 : SUCCESS
> > >>> [2016-02-18 10:03:44.563546]  : volume add-brick c_glusterfs replica
> 2
> > >>> 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> > >>> [2016-02-18 10:30:53.297415]  : volume status : SUCCESS
> > >>> [2016-02-18 10:30:53.313096]  : volume status : SUCCESS
> > >>> [2016-02-18 10:37:02.748714]  : volume status : SUCCESS
> > >>> [2016-02-18 10:37:02.762091]  : volume status : SUCCESS
> > >>> [2016-02-18 10:37:02.816089]  : volume remove-brick c_glusterfs
> > replica 1
> > >>> 10.32.1.144:/opt/lvmdir/c2/brick force : FAILED : Incorrect brick
> > >>> 10.32.1.144:/opt/lvmdir/c2/brick for volume c_glusterfs
> > >>>
> > >>>
> > >>> On Fri, Feb 19, 2016 at 3:05 PM, Gaurav Garg <ggarg@xxxxxxxxxx>
> wrote:
> > >>>
> > >>> > Hi Abhishek,
> > >>> >
> > >>> > Seems your peer 10.32.1.144 have disconnected while doing remove
> > brick.
> > >>> > see the below logs in glusterd:
> > >>> >
> > >>> > [2016-02-18 10:37:02.816009] E [MSGID: 106256]
> > >>> > [glusterd-brick-ops.c:1047:__glusterd_handle_remove_brick]
> > >>> 0-management:
> > >>> > Incorrect brick 10.32.1.144:/opt/lvmdir/c2/brick for volume
> > >>> c_glusterfs
> > >>> > [Invalid argument]
> > >>> > [2016-02-18 10:37:02.816061] E [MSGID: 106265]
> > >>> > [glusterd-brick-ops.c:1088:__glusterd_handle_remove_brick]
> > >>> 0-management:
> > >>> > Incorrect brick 10.32.1.144:/opt/lvmdir/c2/brick for volume
> > >>> c_glusterfs
> > >>> > The message "I [MSGID: 106004]
> > >>> > [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management:
> > Peer
> > >>> > <10.32.1.144> (<6adf57dc-c619-4e56-ae40-90e6aef75fe9>), in state
> > <Peer
> > >>> in
> > >>> > Cluster>, has disconnected from glusterd." repeated 25 times
> between
> > >>> > [2016-02-18 10:35:43.131945] and [2016-02-18 10:36:58.160458]
> > >>> >
> > >>> >
> > >>> >
> > >>> > If you are facing the same issue now, could you paste your #
> gluster
> > >>> peer
> > >>> > status     command output here.
> > >>> >
> > >>> > Thanks,
> > >>> > ~Gaurav
> > >>> >
> > >>> > ----- Original Message -----
> > >>> > From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>
> > >>> > To: gluster-users@xxxxxxxxxxx
> > >>> > Sent: Friday, February 19, 2016 2:46:35 PM
> > >>> > Subject:  Issue in Adding/Removing the gluster node
> > >>> >
> > >>> > Hi,
> > >>> >
> > >>> >
> > >>> > I am working on two board setup connecting to each other. Gluster
> > >>> version
> > >>> > 3.7.6 is running and added two bricks in replica 2 mode but when I
> > >>> manually
> > >>> > removed (detach) the one board from the setup I am getting the
> > >>> following
> > >>> > error.
> > >>> >
> > >>> > volume remove-brick c_glusterfs replica 1 10.32.1.144:
> > >>> /opt/lvmdir/c2/brick
> > >>> > force : FAILED : Incorrect brick 10.32.1.144:/opt/lvmdir/c2/brick
> > for
> > >>> > volume c_glusterfs
> > >>> >
> > >>> > Please find the logs file as an attachment.
> > >>> >
> > >>> >
> > >>> > Regards,
> > >>> > Abhishek
> > >>> >
> > >>> >
> > >>> > _______________________________________________
> > >>> > Gluster-users mailing list
> > >>> > Gluster-users@xxxxxxxxxxx
> > >>> > http://www.gluster.org/mailman/listinfo/gluster-users
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Regards
> > >>> Abhishek Paliwal
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >>
> > >>
> > >>
> > >> Regards
> > >> Abhishek Paliwal
> > >>
> > >>
> > >
> > >
> > >
> > >
> >
> >
> > --
> >
> >
> >
> >
> > Regards
> > Abhishek Paliwal
> >
>

-- 

Regards
Abhishek Paliwal
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users