Re: After reboot, one brick is not being seen by clients

Patrick Haley <phaley@xxxxxxx> · Thu, 28 Nov 2013 23:04:57 +0000

Hi Ravi,

gluster-data is pingable from gluster-0-0, so I tried the detaching/
reattaching.  I had to use the "force" option on the detach on
gluster-0-0.  The first 2 steps seemed to work, however step 3 fails.

-----------------
on gluster-0-0
-----------------
[root@nas-0-0 ~]# gluster peer probe gluster-data
Probe unsuccessful
Probe returned with unknown errno 107

Now, on gluster-data, gluster isn't seeing the peers
(although it can still ping them):

[root@mseas-data ~]# gluster peer status
No peers present

[root@mseas-data ~]# ping gluster-0-1
PING gluster-0-1 (10.1.1.11) 56(84) bytes of data.
64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=1 ttl=64 time=0.103 ms
64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=2 ttl=64 time=0.092 ms
64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=3 ttl=64 time=0.094 ms

--- gluster-0-1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.092/0.096/0.103/0.009 ms

Any further thoughts?  Thanks.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                                                  Email:     phaley@xxxxxxx
Center for Ocean Engineering                     Phone:    (617) 253-6824
Dept. of Mechanical Engineering                 Fax:        (617) 253-8125
MIT, Room 5-213                                      http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

________________________________________
From: Ravishankar N [ravishankar@xxxxxxxxxx]
Sent: Thursday, November 28, 2013 12:32 PM
To: Patrick Haley; gluster-users@xxxxxxxxxxx
Subject: Re:  After reboot, one brick is not being seen by clients

On 11/28/2013 09:30 PM, Patrick Haley wrote:
> Hi Ravi,
>
> I'm pretty sure the clients use fuse mounts.  The relevant line from /etc/fstab is
>
> mseas-data:/gdata       /gdata           glusterfs  defaults,_netdev     0 0
>
>
> gluster-data sees the other bricks as connected.  The other bricks see each
> other as connected but gluster-data as disconnected:
>
> ---------------
> gluster-data:
> ---------------
> [root@mseas-data ~]# gluster peer status
> Number of Peers: 2
>
> Hostname: gluster-0-1
> Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
> State: Peer in Cluster (Connected)
>
> Hostname: gluster-0-0
> Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
> State: Peer in Cluster (Connected)
>
> -------------
> gluster-0-0:
> --------------
> [root@nas-0-0 ~]# gluster peer status
> Number of Peers: 2
>
> Hostname: gluster-data
> Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
> State: Peer in Cluster (Disconnected)
>
> Hostname: gluster-0-1
> Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
> State: Peer in Cluster (Connected)
>
> -------------
> gluster-0-1:
> --------------
> [root@nas-0-1 ~]# gluster peer status
> Number of Peers: 2
>
> Hostname: gluster-data
> Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
> State: Peer in Cluster (Disconnected)
>
> Hostname: gluster-0-0
> Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
> State: Peer in Cluster (Connected)
>
> Does any of this suggest what I need to look at next?
Hi Patrick,
If  gluster-data is pingable from the other bricks, you could try
detaching and retttaching it from gluster-0-0 or 0-1.
1) On gluster-0-0:
     `gluster peer detach gluster-data`, if that fails, `gluster peer
detach gluster-data force`
2) On gluster-data:
     `rm -rf /var/lib/glusterd`
     `service glusterd restart`
3) Again on gluster-0-0:
     'gluster peer probe gluster-data'

Now check if things work.
PS:You should really do a 'reply-to-all' so that your queries reach a
wider audience, getting you  faster responses from the community. Also
serves as a double-check in case I goof up :)

I'm off to sleep now.
>
> Thanks.
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                                                  Email:     phaley@xxxxxxx
> Center for Ocean Engineering                     Phone:    (617) 253-6824
> Dept. of Mechanical Engineering                 Fax:        (617) 253-8125
> MIT, Room 5-213                                      http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
>
>
> ________________________________________
> From: Ravishankar N [ravishankar@xxxxxxxxxx]
> Sent: Thursday, November 28, 2013 2:48 AM
> To: Patrick Haley
> Cc: gluster-users@xxxxxxxxxxx
> Subject: Re:  After reboot, one brick is not being seen by clients
>
> On 11/28/2013 12:52 PM, Patrick Haley wrote:
>> Hi Ravi,
>>
>> Thanks for the reply.  If I interpret the output of gluster volume status
>> correctly, glusterfsd was running
>>
>> [root@mseas-data ~]# gluster volume status
>> Status of volume: gdata
>> Gluster process                                         Port    Online  Pid
>> ------------------------------------------------------------------------------
>> Brick gluster-0-0:/mseas-data-0-0                       24009   Y       27006
>> Brick gluster-0-1:/mseas-data-0-1                       24009   Y       7063
>> Brick gluster-data:/data                                24009   Y       2897
>> NFS Server on localhost                                 38467   Y       2903
>> NFS Server on gluster-0-1                               38467   Y       7069
>> NFS Server on gluster-0-0                               38467   Y       27012
>>
>> For completeness, I tried both "service glusterd restart" and
>> "gluster volume start gdata force".  Neither solved the problem.
>> Note that after "gluster volume start gdata force" the gluster volume status
>> failed
>>
>> [root@mseas-data ~]# gluster volume status
>> operation failed
>>
>> Failed to get names of volumes
>>
>> Doing another "service glusterd restart"  let the "gluster volume status"
>> command work, but the clients still don't see the files on mseas-data.
> Are your clients using fuse mounts or NFS mounts?
>> A second piece of data, on the other bricks, "gluster volume status"does not
>> show gluster-data:/data:
> Hmm, could you check if all 3 bricks are connected ? `gluster peer
> status` on each brick should show the others as connected.
>> [root@nas-0-0 ~]# gluster volume status
>> Status of volume: gdata
>> Gluster process                                         Port    Online  Pid
>> ------------------------------------------------------------------------------
>> Brick gluster-0-0:/mseas-data-0-0                       24009   Y       27006
>> Brick gluster-0-1:/mseas-data-0-1                       24009   Y       7063
>> NFS Server on localhost                                 38467   Y       27012
>> NFS Server on gluster-0-1                               38467   Y       8051
>>
>> Any thoughts on what I should look at next?
> Also noticed the NFS server process on gluster-0-1 (on which I guess no
> commands were run ) seems to have changed it's pid from 7069 to 8051.
> FWIW, I am able to observe a similar bug
> (https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be
> investigated.
>
> Thanks,
> Ravi
>> Thanks again.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley                                                  Email:     phaley@xxxxxxx
>> Center for Ocean Engineering                     Phone:    (617) 253-6824
>> Dept. of Mechanical Engineering                 Fax:        (617) 253-8125
>> MIT, Room 5-213                                      http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA  02139-4301
>>
>>
>> ________________________________________
>> From: Ravishankar N [ravishankar@xxxxxxxxxx]
>> Sent: Wednesday, November 27, 2013 11:21 PM
>> To: Patrick Haley; gluster-users@xxxxxxxxxxx
>> Subject: Re:  After reboot, one brick is not being seen by clients
>>
>> On 11/28/2013 03:12 AM, Pat Haley wrote:
>>> Hi,
>>>
>>> We are currently using gluster with 3 bricks.  We just
>>> rebooted one of the bricks (mseas-data, also identified
>>> as gluster-data) which is actually the main server.  After
>>> rebooting this brick, our client machine (mseas) only sees
>>> the files on the other 2 bricks.  Note that if I mount
>>> the gluster filespace (/gdata) on the brick I rebooted,
>>> it sees the entire space.
>>>
>>> The last time I had this problem, there was an error in
>>> one of our /etc/hosts file.  This does not seem to be the
>>> case now.
>>>
>>> What else can I look at to debug this problem?
>>>
>>> Some information I have from the gluster server
>>>
>>> [root@mseas-data ~]# gluster --version
>>> glusterfs 3.3.1 built on Oct 11 2012 22:01:05
>>>
>>> [root@mseas-data ~]# gluster volume info
>>>
>>> Volume Name: gdata
>>> Type: Distribute
>>> Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d
>>> Status: Started
>>> Number of Bricks: 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gluster-0-0:/mseas-data-0-0
>>> Brick2: gluster-0-1:/mseas-data-0-1
>>> Brick3: gluster-data:/data
>>>
>>> [root@mseas-data ~]# ps -ef | grep gluster
>>>
>>> root      2781     1  0 15:16 ?        00:00:00 /usr/sbin/glusterd -p
>>> /var/run/glusterd.pid
>>> root      2897     1  0 15:16 ?        00:00:00 /usr/sbin/glusterfsd
>>> -s localhost --volfile-id gdata.gluster-data.data -p
>>> /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S
>>> /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l
>>> /var/log/glusterfs/bricks/data.log --xlator-option
>>> *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed
>>> --brick-port 24009 --xlator-option gdata-server.listen-port=24009
>>> root      2903     1  0 15:16 ?        00:00:00 /usr/sbin/glusterfs -s
>>> localhost --volfile-id gluster/nfs -p
>>> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
>>> /tmp/d5c892de43c28a1ee7481b780245b789.socket
>>> root      4258     1  0 15:52 ?        00:00:00 /usr/sbin/glusterfs
>>> --volfile-id=/gdata --volfile-server=mseas-data /gdata
>>> root      4475  4033  0 16:35 pts/0    00:00:00 grep gluster
>>> [
>>>
>>    From the ps output, the brick process (glusterfsd) doesn't seem to be
>> running on the gluster-data server. Run `gluster volume status` and
>> check if that is indeed the case. If yes, you could either restart
>> glusterd on the brick node (`service glusterd restart`) or restart the
>> entire volume (`gluster volume start gdata force`) which should bring
>> back the brick process online.
>>
>> I'm not sure why glusterd did not start the brick process when you
>> rebooted the machine in the first place. You could perhaps check the
>> glusterd log for clues).
>>
>> Hope this helps,
>> Ravi
>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley                          Email:  phaley@xxxxxxx
>>> Center for Ocean Engineering       Phone:  (617) 253-6824
>>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA  02139-4301
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@xxxxxxxxxxx
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users