Re: Issue in Adding/Removing the gluster node

ABHISHEK PALIWAL <abhishpaliwal@xxxxxxxxx> · Tue, 23 Feb 2016 13:10:27 +0530

Hi Gaurav,

In my case we are removing the brick in the offline state with the force option like in the following way:

gluster volume remove-brick %s replica 1 %s:%s force --mode=script

but still getting the failure or remove-brick

it seems that brick is not present which we are trying to remove here are the log snippet of both of the boards

1st board:

# gluster volume info 

status 

gluster volume status c_glusterfs  

Volume Name: c_glusterfs

Type: Replicate

Volume ID: 32793e91-6f88-4f29-b3e4-0d53d02a4b99

Status: Started

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: 10.32.0.48:/opt/lvmdir/c2/brick

Brick2: 10.32.1.144:/opt/lvmdir/c2/brick

Options Reconfigured:

nfs.disable: on

network.ping-timeout: 4

performance.readdir-ahead: on

# gluster peer status 

Number of Peers: 1

Hostname: 10.32.1.144

Uuid: b88c74b9-457d-4864-9fe6-403f6934d7d1

State: Peer in Cluster (Connected)

# gluster volume status c_glusterfs 

Status of volume: c_glusterfs

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------

Brick 10.32.0.48:/opt/lvmdir/c2/brick       49153     0          Y       2537 

Self-heal Daemon on localhost               N/A       N/A        Y       5577 

Self-heal Daemon on 10.32.1.144             N/A       N/A        Y       3850 

Task Status of Volume c_glusterfs

------------------------------------------------------------------------------

There are no active volume tasks

2nd Board:

# gluster volume info 

status 

gluster volume status c_glusterfs 

gluster volume heal c_glusterfs info

Volume Name: c_glusterfs

Type: Replicate

Volume ID: 32793e91-6f88-4f29-b3e4-0d53d02a4b99

Status: Started

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: 10.32.0.48:/opt/lvmdir/c2/brick

Brick2: 10.32.1.144:/opt/lvmdir/c2/brick

Options Reconfigured:

performance.readdir-ahead: on

network.ping-timeout: 4

nfs.disable: on

# gluster peer status 

Number of Peers: 1

Hostname: 10.32.0.48

Uuid: e7c4494e-aa04-4909-81c9-27a462f6f9e7

State: Peer in Cluster (Connected)

# gluster volume status c_glusterfs 

Status of volume: c_glusterfs

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------

Brick 10.32.0.48:/opt/lvmdir/c2/brick       49153     0          Y       2537 

Self-heal Daemon on localhost               N/A       N/A        Y       3850 

Self-heal Daemon on 10.32.0.48              N/A       N/A        Y       5577 

Task Status of Volume c_glusterfs

------------------------------------------------------------------------------

There are no active volume tasks

Do you know why these logs are not showing the Brick info at the time of gluster volume status.
Because we are not able to collect the logs of cmd_history.log file from the 2nd board.

Regards,
Abhishek

On Tue, Feb 23, 2016 at 12:02 PM, Gaurav Garg <ggarg@xxxxxxxxxx> wrote:
Hi abhishek,

>> Can we perform remove-brick operation on the offline brick? what is the

meaning of offline and online brick?

No, you can't perform remove-brick operation on the offline brick. brick is offline means brick process is not running. you can see it by executing #gluster volume status. If brick is offline then respective brick will show "N" entry in Online column of #gluster volume status command. Alternatively you can also check whether glusterfsd process for that brick is running or not by executing #ps aux | grep glusterfsd, this command will list out all the brick process you can filter out from them, which one is online, which one is not.

But if you want to perform remove-brick operation on the offline brick then you need to execute it with force option. #gluster volume remove-brick <volname> hostname:/brick_name force. This might lead to data loss.

>> Also, Is there any logic in gluster through which we can check the

connectivity of node established or not before performing the any operation

on brick?

Yes, you can check it by executing #gluster peer status command.

Thanks,

~Gaurav

----- Original Message -----

From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>

To: "Gaurav Garg" <ggarg@xxxxxxxxxx>

Cc: gluster-users@xxxxxxxxxxx

Sent: Tuesday, February 23, 2016 11:50:43 AM

Subject: Re:  Issue in Adding/Removing the gluster node

Hi Gaurav,

one general question related to gluster bricks.

Can we perform remove-brick operation on the offline brick? what is the

meaning of offline and online brick?

Also, Is there any logic in gluster through which we can check the

connectivity of node established or not before performing the any operation

on brick?

Regards,

Abhishek

On Mon, Feb 22, 2016 at 2:42 PM, Gaurav Garg <ggarg@xxxxxxxxxx> wrote:

> Hi abhishek,

>

> I went through your logs of node 1 and by looking glusterd logs its

> clearly indicate that your 2nd node (10.32.1.144) have disconnected from

> the cluster, because of that remove-brick operation failed. I think you

> need to check your network interface.

>

> But surprising things is that i did not see duplicate peer entry in

> #gluster peer status command output.

>

> May be i will get some more information from your (10.32.1.144) 2nd node

> logs. Could you also attach your 2nd node logs.

>

> after restarting glusterd, are you seeing duplicate peer entry in #gluster

> peer status command output ?

>

> will wait for 2nd node logs for further analyzing duplicate peer entry

> problem.

>

> Thanks,

>

> ~Gaurav

>

> ----- Original Message -----

> From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>

> To: "Gaurav Garg" <ggarg@xxxxxxxxxx>

> Cc: gluster-users@xxxxxxxxxxx

> Sent: Monday, February 22, 2016 12:48:55 PM

> Subject: Re:  Issue in Adding/Removing the gluster node

>

> Hi Gaurav,

>

> Here, You can find the attached logs for the boards in case of remove-brick

> failure.

> In these logs we do not have the cmd_history and

> etc-glusterfs-glusterd.vol.log for the second board.

>

> May be for that we need to some more time.

>

>

> Regards,

> Abhishek

>

> On Mon, Feb 22, 2016 at 10:18 AM, Gaurav Garg <ggarg@xxxxxxxxxx> wrote:

>

> > Hi Abhishek,

> >

> > >>  I'll provide the required log to you.

> >

> > sure

> >

> > on both node. do "pkill glusterd" and then start glusterd services.

> >

> > Thanks,

> >

> > ~Gaurav

> >

> > ----- Original Message -----

> > From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>

> > To: "Gaurav Garg" <ggarg@xxxxxxxxxx>

> > Cc: gluster-users@xxxxxxxxxxx

> > Sent: Monday, February 22, 2016 10:11:48 AM

> > Subject: Re:  Issue in Adding/Removing the gluster node

> >

> > Hi Gaurav,

> >

> > Thanks for your prompt reply.

> >

> > I'll provide the required log to you.

> >

> > As a workaround you suggested that restart the glusterd service. Could

> you

> > please tell me the point where I can do this?

> >

> > Regards,

> > Abhishek

> >

> > On Fri, Feb 19, 2016 at 6:11 PM, Gaurav Garg <ggarg@xxxxxxxxxx> wrote:

> >

> > > Hi Abhishek,

> > >

> > > Peer status output looks interesting where it have stale entry,

> > > technically it should not happen. Here few thing need to ask

> > >

> > > Did you perform any manual operation with GlusterFS configuration file

> > > which resides in /var/lib/glusterd/* folder.

> > >

> > > Can you provide output of "ls /var/lib/glusterd/peers"  from both of

> your

> > > nodes.

> > >

> > > Could you provide output of #gluster peer status command when 2nd node

> is

> > > down

> > >

> > > Can you provide output of #gluster volume info command

> > >

> > > Can you provide full logs details of cmd_history.log and

> > > etc-glusterfs-glusterd.vol.log from both the nodes.

> > >

> > >

> > > You can restart your glusterd as of now as a workaround but we need to

> > > analysis this issue further.

> > >

> > > Thanks,

> > > Gaurav

> > >

> > > ----- Original Message -----

> > > From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>

> > > To: "Gaurav Garg" <ggarg@xxxxxxxxxx>

> > > Cc: gluster-users@xxxxxxxxxxx

> > > Sent: Friday, February 19, 2016 5:27:21 PM

> > > Subject: Re:  Issue in Adding/Removing the gluster node

> > >

> > > Hi Gaurav,

> > >

> > > After the failure of add-brick following is outcome "gluster peer

> status"

> > > command

> > >

> > > Number of Peers: 2

> > >

> > > Hostname: 10.32.1.144

> > > Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e

> > > State: Peer in Cluster (Connected)

> > >

> > > Hostname: 10.32.1.144

> > > Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e

> > > State: Peer in Cluster (Connected)

> > >

> > > Regards,

> > > Abhishek

> > >

> > > On Fri, Feb 19, 2016 at 5:21 PM, ABHISHEK PALIWAL <

> > abhishpaliwal@xxxxxxxxx

> > > >

> > > wrote:

> > >

> > > > Hi Gaurav,

> > > >

> > > > Both are the board connect through the backplane using ethernet.

> > > >

> > > > Even this inconsistency also occurs when I am trying to bringing back

> > the

> > > > node in slot. Means some time add-brick executes without failure but

> > some

> > > > time following error occurs.

> > > >

> > > > volume add-brick c_glusterfs replica 2 10.32.1.144:

> > /opt/lvmdir/c2/brick

> > > > force : FAILED : Another transaction is in progress for c_glusterfs.

> > > Please

> > > > try again after sometime.

> > > >

> > > >

> > > > You can also see the attached logs for add-brick failure scenario.

> > > >

> > > > Please let me know if you need more logs.

> > > >

> > > > Regards,

> > > > Abhishek

> > > >

> > > >

> > > > On Fri, Feb 19, 2016 at 5:03 PM, Gaurav Garg <ggarg@xxxxxxxxxx>

> wrote:

> > > >

> > > >> Hi Abhishek,

> > > >>

> > > >> How are you connecting two board, and how are you removing it

> manually

> > > >> that need to know because if you are removing your 2nd board from

> the

> > > >> cluster (abrupt shutdown) then you can't perform remove brick

> > operation

> > > in

> > > >> 2nd node from first node and its happening successfully in your

> case.

> > > could

> > > >> you ensure your network connection once again while removing and

> > > bringing

> > > >> back your node again.

> > > >>

> > > >> Thanks,

> > > >> Gaurav

> > > >>

> > > >> ------------------------------

> > > >> *From: *"ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>

> > > >> *To: *"Gaurav Garg" <ggarg@xxxxxxxxxx>

> > > >> *Cc: *gluster-users@xxxxxxxxxxx

> > > >> *Sent: *Friday, February 19, 2016 3:36:21 PM

> > > >>

> > > >> *Subject: *Re:  Issue in Adding/Removing the gluster

> > node

> > > >>

> > > >> Hi Gaurav,

> > > >>

> > > >> Thanks for reply

> > > >>

> > > >> 1. Here, I removed the board manually here but this time it works

> fine

> > > >>

> > > >> [2016-02-18 10:03:40.601472]  : volume remove-brick c_glusterfs

> > replica

> > > 1

> > > >> 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS

> > > >> [2016-02-18 10:03:40.885973]  : peer detach 10.32.1.144 : SUCCESS

> > > >>

> > > >> Yes this time board is reachable but how? don't know because board

> is

> > > >> detached.

> > > >>

> > > >> 2. Here, I attached the board this time its works fine in add-bricks

> > > >>

> > > >> 2016-02-18 10:03:42.065038]  : peer probe 10.32.1.144 : SUCCESS

> > > >> [2016-02-18 10:03:44.563546]  : volume add-brick c_glusterfs

> replica 2

> > > >> 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS

> > > >>

> > > >> 3.Here, again I removed the board this time failed occur

> > > >>

> > > >> [2016-02-18 10:37:02.816089]  : volume remove-brick c_glusterfs

> > replica

> > > 1

> > > >> 10.32.1.144:/opt/lvmdir/c2/brick force : FAILED : Incorrect brick

> > > >> 10.32.1.144:/opt

> > > >> /lvmdir/c2/brick for volume c_glusterfs

> > > >>

> > > >> but here board is not reachable.

> > > >>

> > > >> why this inconsistency is there while doing the same step multiple

> > time.

> > > >>

> > > >> Hope you are getting my point.

> > > >>

> > > >> Regards,

> > > >> Abhishek

> > > >>

> > > >> On Fri, Feb 19, 2016 at 3:25 PM, Gaurav Garg <ggarg@xxxxxxxxxx>

> > wrote:

> > > >>

> > > >>> Abhishek,

> > > >>>

> > > >>> when sometime its working fine means 2nd board network connection

> is

> > > >>> reachable to first node. you can conform this by executing same

> > > #gluster

> > > >>> peer status command.

> > > >>>

> > > >>> Thanks,

> > > >>> Gaurav

> > > >>>

> > > >>> ----- Original Message -----

> > > >>> From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>

> > > >>> To: "Gaurav Garg" <ggarg@xxxxxxxxxx>

> > > >>> Cc: gluster-users@xxxxxxxxxxx

> > > >>> Sent: Friday, February 19, 2016 3:12:22 PM

> > > >>> Subject: Re:  Issue in Adding/Removing the gluster

> > node

> > > >>>

> > > >>> Hi Gaurav,

> > > >>>

> > > >>> Yes, you are right actually I am force fully detaching the node

> from

> > > the

> > > >>> slave and when we removed the board it disconnected from the

> another

> > > >>> board.

> > > >>>

> > > >>> but my question is I am doing this process multiple time some time

> it

> > > >>> works

> > > >>> fine but some time it gave these errors.

> > > >>>

> > > >>>

> > > >>> you can see the following logs from cmd_history.log file

> > > >>>

> > > >>> [2016-02-18 10:03:34.497996]  : volume set c_glusterfs nfs.disable

> > on :

> > > >>> SUCCESS

> > > >>> [2016-02-18 10:03:34.915036]  : volume start c_glusterfs force :

> > > SUCCESS

> > > >>> [2016-02-18 10:03:40.250326]  : volume status : SUCCESS

> > > >>> [2016-02-18 10:03:40.273275]  : volume status : SUCCESS

> > > >>> [2016-02-18 10:03:40.601472]  : volume remove-brick c_glusterfs

> > > replica 1

> > > >>> 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS

> > > >>> [2016-02-18 10:03:40.885973]  : peer detach 10.32.1.144 : SUCCESS

> > > >>> [2016-02-18 10:03:42.065038]  : peer probe 10.32.1.144 : SUCCESS

> > > >>> [2016-02-18 10:03:44.563546]  : volume add-brick c_glusterfs

> replica

> > 2

> > > >>> 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS

> > > >>> [2016-02-18 10:30:53.297415]  : volume status : SUCCESS

> > > >>> [2016-02-18 10:30:53.313096]  : volume status : SUCCESS

> > > >>> [2016-02-18 10:37:02.748714]  : volume status : SUCCESS

> > > >>> [2016-02-18 10:37:02.762091]  : volume status : SUCCESS

> > > >>> [2016-02-18 10:37:02.816089]  : volume remove-brick c_glusterfs

> > > replica 1

> > > >>> 10.32.1.144:/opt/lvmdir/c2/brick force : FAILED : Incorrect brick

> > > >>> 10.32.1.144:/opt/lvmdir/c2/brick for volume c_glusterfs

> > > >>>

> > > >>>

> > > >>> On Fri, Feb 19, 2016 at 3:05 PM, Gaurav Garg <ggarg@xxxxxxxxxx>

> > wrote:

> > > >>>

> > > >>> > Hi Abhishek,

> > > >>> >

> > > >>> > Seems your peer 10.32.1.144 have disconnected while doing remove

> > > brick.

> > > >>> > see the below logs in glusterd:

> > > >>> >

> > > >>> > [2016-02-18 10:37:02.816009] E [MSGID: 106256]

> > > >>> > [glusterd-brick-ops.c:1047:__glusterd_handle_remove_brick]

> > > >>> 0-management:

> > > >>> > Incorrect brick 10.32.1.144:/opt/lvmdir/c2/brick for volume

> > > >>> c_glusterfs

> > > >>> > [Invalid argument]

> > > >>> > [2016-02-18 10:37:02.816061] E [MSGID: 106265]

> > > >>> > [glusterd-brick-ops.c:1088:__glusterd_handle_remove_brick]

> > > >>> 0-management:

> > > >>> > Incorrect brick 10.32.1.144:/opt/lvmdir/c2/brick for volume

> > > >>> c_glusterfs

> > > >>> > The message "I [MSGID: 106004]

> > > >>> > [glusterd-handler.c:5065:__glusterd_peer_rpc_notify]

> 0-management:

> > > Peer

> > > >>> > <10.32.1.144> (<6adf57dc-c619-4e56-ae40-90e6aef75fe9>), in state

> > > <Peer

> > > >>> in

> > > >>> > Cluster>, has disconnected from glusterd." repeated 25 times

> > between

> > > >>> > [2016-02-18 10:35:43.131945] and [2016-02-18 10:36:58.160458]

> > > >>> >

> > > >>> >

> > > >>> >

> > > >>> > If you are facing the same issue now, could you paste your #

> > gluster

> > > >>> peer

> > > >>> > status     command output here.

> > > >>> >

> > > >>> > Thanks,

> > > >>> > ~Gaurav

> > > >>> >

> > > >>> > ----- Original Message -----

> > > >>> > From: "ABHISHEK PALIWAL" <abhishpaliwal@xxxxxxxxx>

> > > >>> > To: gluster-users@xxxxxxxxxxx

> > > >>> > Sent: Friday, February 19, 2016 2:46:35 PM

> > > >>> > Subject:  Issue in Adding/Removing the gluster

> node

> > > >>> >

> > > >>> > Hi,

> > > >>> >

> > > >>> >

> > > >>> > I am working on two board setup connecting to each other. Gluster

> > > >>> version

> > > >>> > 3.7.6 is running and added two bricks in replica 2 mode but when

> I

> > > >>> manually

> > > >>> > removed (detach) the one board from the setup I am getting the

> > > >>> following

> > > >>> > error.

> > > >>> >

> > > >>> > volume remove-brick c_glusterfs replica 1 10.32.1.144:

> > > >>> /opt/lvmdir/c2/brick

> > > >>> > force : FAILED : Incorrect brick 10.32.1.144:

> /opt/lvmdir/c2/brick

> > > for

> > > >>> > volume c_glusterfs

> > > >>> >

> > > >>> > Please find the logs file as an attachment.

> > > >>> >

> > > >>> >

> > > >>> > Regards,

> > > >>> > Abhishek

> > > >>> >

> > > >>> >

> > > >>> > _______________________________________________

> > > >>> > Gluster-users mailing list

> > > >>> > Gluster-users@xxxxxxxxxxx

> > > >>> > http://www.gluster.org/mailman/listinfo/gluster-users

> > > >>> >

> > > >>>

> > > >>>

> > > >>>

> > > >>> --

> > > >>>

> > > >>>

> > > >>>

> > > >>>

> > > >>> Regards

> > > >>> Abhishek Paliwal

> > > >>>

> > > >>

> > > >>

> > > >>

> > > >> --

> > > >>

> > > >>

> > > >>

> > > >>

> > > >> Regards

> > > >> Abhishek Paliwal

> > > >>

> > > >>

> > > >

> > > >

> > > >

> > > >

> > >

> > >

> > > --

> > >

> > >

> > >

> > >

> > > Regards

> > > Abhishek Paliwal

> > >

> >

>

>

>

> --

>

>

>

>

> Regards

> Abhishek Paliwal

>

--

Regards

Abhishek Paliwal

-- 

Regards

Abhishek Paliwal

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users