On 11/29/2013 04:34 AM, Patrick Haley wrote:
Hi Ravi,
gluster-data is pingable from gluster-0-0, so I tried the detaching/
reattaching. I had to use the "force" option on the detach on
gluster-0-0. The first 2 steps seemed to work, however step 3 fails.
-----------------
on gluster-0-0
-----------------
[root@nas-0-0 ~]# gluster peer probe gluster-data
Probe unsuccessful
Probe returned with unknown errno 107
Now, on gluster-data, gluster isn't seeing the peers
(although it can still ping them):
Most likely a firewall issue; you need to clear the iptable rules. This
link should help you:
http://thr3ads.net/gluster-users/2013/05/2639667-peer-probe-fails-107
[root@mseas-data ~]# gluster peer status
No peers present
[root@mseas-data ~]# ping gluster-0-1
PING gluster-0-1 (10.1.1.11) 56(84) bytes of data.
64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=1 ttl=64 time=0.103 ms
64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=2 ttl=64 time=0.092 ms
64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=3 ttl=64 time=0.094 ms
--- gluster-0-1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.092/0.096/0.103/0.009 ms
Any further thoughts? Thanks.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley@xxxxxxx
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
________________________________________
From: Ravishankar N [ravishankar@xxxxxxxxxx]
Sent: Thursday, November 28, 2013 12:32 PM
To: Patrick Haley; gluster-users@xxxxxxxxxxx
Subject: Re: After reboot, one brick is not being seen by clients
On 11/28/2013 09:30 PM, Patrick Haley wrote:
Hi Ravi,
I'm pretty sure the clients use fuse mounts. The relevant line from /etc/fstab is
mseas-data:/gdata /gdata glusterfs defaults,_netdev 0 0
gluster-data sees the other bricks as connected. The other bricks see each
other as connected but gluster-data as disconnected:
---------------
gluster-data:
---------------
[root@mseas-data ~]# gluster peer status
Number of Peers: 2
Hostname: gluster-0-1
Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
State: Peer in Cluster (Connected)
Hostname: gluster-0-0
Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
State: Peer in Cluster (Connected)
-------------
gluster-0-0:
--------------
[root@nas-0-0 ~]# gluster peer status
Number of Peers: 2
Hostname: gluster-data
Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
State: Peer in Cluster (Disconnected)
Hostname: gluster-0-1
Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
State: Peer in Cluster (Connected)
-------------
gluster-0-1:
--------------
[root@nas-0-1 ~]# gluster peer status
Number of Peers: 2
Hostname: gluster-data
Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
State: Peer in Cluster (Disconnected)
Hostname: gluster-0-0
Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
State: Peer in Cluster (Connected)
Does any of this suggest what I need to look at next?
Hi Patrick,
If gluster-data is pingable from the other bricks, you could try
detaching and retttaching it from gluster-0-0 or 0-1.
1) On gluster-0-0:
`gluster peer detach gluster-data`, if that fails, `gluster peer
detach gluster-data force`
2) On gluster-data:
`rm -rf /var/lib/glusterd`
`service glusterd restart`
3) Again on gluster-0-0:
'gluster peer probe gluster-data'
Now check if things work.
PS:You should really do a 'reply-to-all' so that your queries reach a
wider audience, getting you faster responses from the community. Also
serves as a double-check in case I goof up :)
I'm off to sleep now.
Thanks.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley@xxxxxxx
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
________________________________________
From: Ravishankar N [ravishankar@xxxxxxxxxx]
Sent: Thursday, November 28, 2013 2:48 AM
To: Patrick Haley
Cc: gluster-users@xxxxxxxxxxx
Subject: Re: After reboot, one brick is not being seen by clients
On 11/28/2013 12:52 PM, Patrick Haley wrote:
Hi Ravi,
Thanks for the reply. If I interpret the output of gluster volume status
correctly, glusterfsd was running
[root@mseas-data ~]# gluster volume status
Status of volume: gdata
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006
Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063
Brick gluster-data:/data 24009 Y 2897
NFS Server on localhost 38467 Y 2903
NFS Server on gluster-0-1 38467 Y 7069
NFS Server on gluster-0-0 38467 Y 27012
For completeness, I tried both "service glusterd restart" and
"gluster volume start gdata force". Neither solved the problem.
Note that after "gluster volume start gdata force" the gluster volume status
failed
[root@mseas-data ~]# gluster volume status
operation failed
Failed to get names of volumes
Doing another "service glusterd restart" let the "gluster volume status"
command work, but the clients still don't see the files on mseas-data.
Are your clients using fuse mounts or NFS mounts?
A second piece of data, on the other bricks, "gluster volume status"does not
show gluster-data:/data:
Hmm, could you check if all 3 bricks are connected ? `gluster peer
status` on each brick should show the others as connected.
[root@nas-0-0 ~]# gluster volume status
Status of volume: gdata
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006
Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063
NFS Server on localhost 38467 Y 27012
NFS Server on gluster-0-1 38467 Y 8051
Any thoughts on what I should look at next?
Also noticed the NFS server process on gluster-0-1 (on which I guess no
commands were run ) seems to have changed it's pid from 7069 to 8051.
FWIW, I am able to observe a similar bug
(https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be
investigated.
Thanks,
Ravi
Thanks again.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley@xxxxxxx
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
________________________________________
From: Ravishankar N [ravishankar@xxxxxxxxxx]
Sent: Wednesday, November 27, 2013 11:21 PM
To: Patrick Haley; gluster-users@xxxxxxxxxxx
Subject: Re: After reboot, one brick is not being seen by clients
On 11/28/2013 03:12 AM, Pat Haley wrote:
Hi,
We are currently using gluster with 3 bricks. We just
rebooted one of the bricks (mseas-data, also identified
as gluster-data) which is actually the main server. After
rebooting this brick, our client machine (mseas) only sees
the files on the other 2 bricks. Note that if I mount
the gluster filespace (/gdata) on the brick I rebooted,
it sees the entire space.
The last time I had this problem, there was an error in
one of our /etc/hosts file. This does not seem to be the
case now.
What else can I look at to debug this problem?
Some information I have from the gluster server
[root@mseas-data ~]# gluster --version
glusterfs 3.3.1 built on Oct 11 2012 22:01:05
[root@mseas-data ~]# gluster volume info
Volume Name: gdata
Type: Distribute
Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: gluster-0-0:/mseas-data-0-0
Brick2: gluster-0-1:/mseas-data-0-1
Brick3: gluster-data:/data
[root@mseas-data ~]# ps -ef | grep gluster
root 2781 1 0 15:16 ? 00:00:00 /usr/sbin/glusterd -p
/var/run/glusterd.pid
root 2897 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfsd
-s localhost --volfile-id gdata.gluster-data.data -p
/var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S
/tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l
/var/log/glusterfs/bricks/data.log --xlator-option
*-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed
--brick-port 24009 --xlator-option gdata-server.listen-port=24009
root 2903 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfs -s
localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/tmp/d5c892de43c28a1ee7481b780245b789.socket
root 4258 1 0 15:52 ? 00:00:00 /usr/sbin/glusterfs
--volfile-id=/gdata --volfile-server=mseas-data /gdata
root 4475 4033 0 16:35 pts/0 00:00:00 grep gluster
[
From the ps output, the brick process (glusterfsd) doesn't seem to be
running on the gluster-data server. Run `gluster volume status` and
check if that is indeed the case. If yes, you could either restart
glusterd on the brick node (`service glusterd restart`) or restart the
entire volume (`gluster volume start gdata force`) which should bring
back the brick process online.
I'm not sure why glusterd did not start the brick process when you
rebooted the machine in the first place. You could perhaps check the
glusterd log for clues).
Hope this helps,
Ravi
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley@xxxxxxx
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users