Re: After reboot, one brick is not being seen by clients

Ravishankar N <ravishankar@xxxxxxxxxx> · Fri, 29 Nov 2013 07:45:43 +0530

On 11/29/2013 04:34 AM, Patrick Haley wrote:
Hi Ravi,

gluster-data is pingable from gluster-0-0, so I tried the detaching/
reattaching.  I had to use the "force" option on the detach on
gluster-0-0.  The first 2 steps seemed to work, however step 3 fails.

-----------------
on gluster-0-0
-----------------
[root@nas-0-0 ~]# gluster peer probe gluster-data
Probe unsuccessful
Probe returned with unknown errno 107

Now, on gluster-data, gluster isn't seeing the peers
(although it can still ping them):
Most likely a firewall issue; you need to clear the iptable rules. This 
link should help you: 
http://thr3ads.net/gluster-users/2013/05/2639667-peer-probe-fails-107
[root@mseas-data ~]# gluster peer status
No peers present

[root@mseas-data ~]# ping gluster-0-1
PING gluster-0-1 (10.1.1.11) 56(84) bytes of data.
64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=1 ttl=64 time=0.103 ms
64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=2 ttl=64 time=0.092 ms
64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=3 ttl=64 time=0.094 ms

--- gluster-0-1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.092/0.096/0.103/0.009 ms

Any further thoughts?  Thanks.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                                                  Email:     phaley@xxxxxxx
Center for Ocean Engineering                     Phone:    (617) 253-6824
Dept. of Mechanical Engineering                 Fax:        (617) 253-8125
MIT, Room 5-213                                      http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

________________________________________
From: Ravishankar N [ravishankar@xxxxxxxxxx]
Sent: Thursday, November 28, 2013 12:32 PM
To: Patrick Haley; gluster-users@xxxxxxxxxxx
Subject: Re:  After reboot, one brick is not being seen by clients

On 11/28/2013 09:30 PM, Patrick Haley wrote:
Hi Ravi,

I'm pretty sure the clients use fuse mounts.  The relevant line from /etc/fstab is

mseas-data:/gdata       /gdata           glusterfs  defaults,_netdev     0 0

gluster-data sees the other bricks as connected.  The other bricks see each
other as connected but gluster-data as disconnected:

---------------
gluster-data:
---------------
[root@mseas-data ~]# gluster peer status
Number of Peers: 2

Hostname: gluster-0-1
Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
State: Peer in Cluster (Connected)

Hostname: gluster-0-0
Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
State: Peer in Cluster (Connected)

-------------
gluster-0-0:
--------------
[root@nas-0-0 ~]# gluster peer status
Number of Peers: 2

Hostname: gluster-data
Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
State: Peer in Cluster (Disconnected)

Hostname: gluster-0-1
Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
State: Peer in Cluster (Connected)

-------------
gluster-0-1:
--------------
[root@nas-0-1 ~]# gluster peer status
Number of Peers: 2

Hostname: gluster-data
Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
State: Peer in Cluster (Disconnected)

Hostname: gluster-0-0
Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
State: Peer in Cluster (Connected)

Does any of this suggest what I need to look at next?
Hi Patrick,
If  gluster-data is pingable from the other bricks, you could try
detaching and retttaching it from gluster-0-0 or 0-1.
1) On gluster-0-0:
      `gluster peer detach gluster-data`, if that fails, `gluster peer
detach gluster-data force`
2) On gluster-data:
      `rm -rf /var/lib/glusterd`
      `service glusterd restart`
3) Again on gluster-0-0:
      'gluster peer probe gluster-data'

Now check if things work.
PS:You should really do a 'reply-to-all' so that your queries reach a
wider audience, getting you  faster responses from the community. Also
serves as a double-check in case I goof up :)

I'm off to sleep now.
Thanks.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                                                  Email:     phaley@xxxxxxx
Center for Ocean Engineering                     Phone:    (617) 253-6824
Dept. of Mechanical Engineering                 Fax:        (617) 253-8125
MIT, Room 5-213                                      http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

________________________________________
From: Ravishankar N [ravishankar@xxxxxxxxxx]
Sent: Thursday, November 28, 2013 2:48 AM
To: Patrick Haley
Cc: gluster-users@xxxxxxxxxxx
Subject: Re:  After reboot, one brick is not being seen by clients

On 11/28/2013 12:52 PM, Patrick Haley wrote:
Hi Ravi,

Thanks for the reply.  If I interpret the output of gluster volume status
correctly, glusterfsd was running

[root@mseas-data ~]# gluster volume status
Status of volume: gdata
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick gluster-0-0:/mseas-data-0-0                       24009   Y       27006
Brick gluster-0-1:/mseas-data-0-1                       24009   Y       7063
Brick gluster-data:/data                                24009   Y       2897
NFS Server on localhost                                 38467   Y       2903
NFS Server on gluster-0-1                               38467   Y       7069
NFS Server on gluster-0-0                               38467   Y       27012

For completeness, I tried both "service glusterd restart" and
"gluster volume start gdata force".  Neither solved the problem.
Note that after "gluster volume start gdata force" the gluster volume status
failed

[root@mseas-data ~]# gluster volume status
operation failed

Failed to get names of volumes

Doing another "service glusterd restart"  let the "gluster volume status"
command work, but the clients still don't see the files on mseas-data.
Are your clients using fuse mounts or NFS mounts?
A second piece of data, on the other bricks, "gluster volume status"does not
show gluster-data:/data:
Hmm, could you check if all 3 bricks are connected ? `gluster peer
status` on each brick should show the others as connected.
[root@nas-0-0 ~]# gluster volume status
Status of volume: gdata
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick gluster-0-0:/mseas-data-0-0                       24009   Y       27006
Brick gluster-0-1:/mseas-data-0-1                       24009   Y       7063
NFS Server on localhost                                 38467   Y       27012
NFS Server on gluster-0-1                               38467   Y       8051

Any thoughts on what I should look at next?
Also noticed the NFS server process on gluster-0-1 (on which I guess no
commands were run ) seems to have changed it's pid from 7069 to 8051.
FWIW, I am able to observe a similar bug
(https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be
investigated.

Thanks,
Ravi
Thanks again.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                                                  Email:     phaley@xxxxxxx
Center for Ocean Engineering                     Phone:    (617) 253-6824
Dept. of Mechanical Engineering                 Fax:        (617) 253-8125
MIT, Room 5-213                                      http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

________________________________________
From: Ravishankar N [ravishankar@xxxxxxxxxx]
Sent: Wednesday, November 27, 2013 11:21 PM
To: Patrick Haley; gluster-users@xxxxxxxxxxx
Subject: Re:  After reboot, one brick is not being seen by clients

On 11/28/2013 03:12 AM, Pat Haley wrote:
Hi,

We are currently using gluster with 3 bricks.  We just
rebooted one of the bricks (mseas-data, also identified
as gluster-data) which is actually the main server.  After
rebooting this brick, our client machine (mseas) only sees
the files on the other 2 bricks.  Note that if I mount
the gluster filespace (/gdata) on the brick I rebooted,
it sees the entire space.

The last time I had this problem, there was an error in
one of our /etc/hosts file.  This does not seem to be the
case now.

What else can I look at to debug this problem?

Some information I have from the gluster server

[root@mseas-data ~]# gluster --version
glusterfs 3.3.1 built on Oct 11 2012 22:01:05

[root@mseas-data ~]# gluster volume info

Volume Name: gdata
Type: Distribute
Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: gluster-0-0:/mseas-data-0-0
Brick2: gluster-0-1:/mseas-data-0-1
Brick3: gluster-data:/data

[root@mseas-data ~]# ps -ef | grep gluster

root      2781     1  0 15:16 ?        00:00:00 /usr/sbin/glusterd -p
/var/run/glusterd.pid
root      2897     1  0 15:16 ?        00:00:00 /usr/sbin/glusterfsd
-s localhost --volfile-id gdata.gluster-data.data -p
/var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S
/tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l
/var/log/glusterfs/bricks/data.log --xlator-option
*-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed
--brick-port 24009 --xlator-option gdata-server.listen-port=24009
root      2903     1  0 15:16 ?        00:00:00 /usr/sbin/glusterfs -s
localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/tmp/d5c892de43c28a1ee7481b780245b789.socket
root      4258     1  0 15:52 ?        00:00:00 /usr/sbin/glusterfs
--volfile-id=/gdata --volfile-server=mseas-data /gdata
root      4475  4033  0 16:35 pts/0    00:00:00 grep gluster
[

    From the ps output, the brick process (glusterfsd) doesn't seem to be
running on the gluster-data server. Run `gluster volume status` and
check if that is indeed the case. If yes, you could either restart
glusterd on the brick node (`service glusterd restart`) or restart the
entire volume (`gluster volume start gdata force`) which should bring
back the brick process online.

I'm not sure why glusterd did not start the brick process when you
rebooted the machine in the first place. You could perhaps check the
glusterd log for clues).

Hope this helps,
Ravi

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users