Re: How do I temporarily take a brick out of service and then put it back later?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Trying this command to remove the brick on the failed node:

[root@lme-fw2 ~]# gluster volume remove-brick firewall-scripts 192.168.253.1:/firewall-scripts
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
[root@lme-fw2 ~]#

But running top in another window - I see 97.8%wa with 0.0%id.  This suggests this system is spending all its time idle, waiting for disk I/Os to complete.  Even after I removed the brick associated with the dead node.  And gluster volume info has been hung for the past several minutes.  After 5+ minutes, it finally tells me no volumes present.  So what happened to the volumes I set up?  

But check this out:

[root@lme-fw2 ~]# gluster volume info
No volumes present
[root@lme-fw2 ~]#

In another window, I cd /firewall-scripts and look at a file.  This is my gluster volume.   Then I do this again:

[root@lme-fw2 ~]#
[root@lme-fw2 ~]# gluster volume info

Volume Name: firewall-scripts
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.253.1:/gluster-fw1
Brick2: 192.168.253.2:/gluster-fw2
Options Reconfigured:
network.ping-timeout: 5
[root@lme-fw2 ~]#

And now my volume shows up.  With both bricks.  What's up with that?  I removed the old brick but now it's here.  I also set my ping-timeout to 5 seconds something like an hour ago. 

So trying to remove the brick again...  At least it generates some output this time:

[root@lme-fw2 ~]# gluster volume remove-brick firewall-scripts 192.168.253.1:/firewall-scripts
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Incorrect brick 192.168.253.1:/firewall-scripts for volume firewall-scripts
[root@lme-fw2 ~]#

Ah - my brick name is wrong.  Trying again with the correct brick name....  Uh-oh!

[root@lme-fw2 ~]#
[root@lme-fw2 ~]# gluster volume remove-brick firewall-scripts 192.168.253.1:/gluster-fw1
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Remove Brick successful
[root@lme-fw2 ~]#
[root@lme-fw2 ~]# gluster volume info

And we're hung.

When I tell the surviving node to take out the brick from the failed node, why does Gluster on the surviving node hang???

[root@lme-fw2 firewall-scripts]# ls
ls: cannot open directory .: Transport endpoint is not connected
[root@lme-fw2 firewall-scripts]# pwd
/firewall-scripts
[root@lme-fw2 firewall-scripts]#
[root@lme-fw2 firewall-scripts]# ls

[root@lme-fw2 firewall-scripts]# ls
ls: cannot open directory .: Transport endpoint is not connected
[root@lme-fw2 firewall-scripts]# pwd
/firewall-scripts
[root@lme-fw2 firewall-scripts]#
[root@lme-fw2 firewall-scripts]# ls
ls: cannot access allow-all-with-nat: Transport endpoint is not connected
ls: cannot access rc.firewall: Transport endpoint is not connected
ls: cannot access rcfirewall.conf: Transport endpoint is not connected
ls: cannot access make-virgin.sh: Transport endpoint is not connected
ls: cannot access start-failover-monitor.sh: Transport endpoint is not connected
ls: cannot access failover-monitor.sh: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120201: Transport endpoint is not connected
ls: cannot access rc.firewall-20120201: Transport endpoint is not connected
ls: cannot access fwdate.txt: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120210: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120302: Transport endpoint is not connected
ls: cannot access rc.firewall-20120302: Transport endpoint is not connected
ls: cannot access failover-monitor.sh-20120406: Transport endpoint is not connected
ls: cannot access rc.firewall-20120704: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120704: Transport endpoint is not connected
ls: cannot access initial_rc.firewall-20120708: Transport endpoint is not connected
ls: cannot access =: Transport endpoint is not connected
ls: cannot access append.txt: Transport endpoint is not connected
ls: cannot access rc.firewall-20120708: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120708: Transport endpoint is not connected
ls: reading directory .: Transport endpoint is not connected
=                    failover-monitor.sh-20120406  rc.firewall-20120302      rcfirewall.conf-20120302
allow-all            fwdate.txt                    rc.firewall-20120704      rcfirewall.conf-20120704
allow-all-with-nat   initial_rc.firewall-20120708  rc.firewall-20120708      rcfirewall.conf-20120708
append.txt           make-virgin.sh                rcfirewall.conf           start-failover-monitor.sh
etc                  rc.firewall                   rcfirewall.conf-20120201
failover-monitor.sh  rc.firewall-20120201          rcfirewall.conf-20120210
[root@lme-fw2 firewall-scripts]#

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux