Hi Ravi, Success! After flushing the iptables on gluster-data, I had to restart the glusterd on all three bricks. Now the clients see all the files on /gdata. Thanks for all of your efforts in solving this issue. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley@xxxxxxx Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 ________________________________________ From: Ravishankar N [ravishankar@xxxxxxxxxx] Sent: Thursday, November 28, 2013 9:15 PM To: Patrick Haley; gluster-users@xxxxxxxxxxx Cc: SPostma@xxxxxxxxxxxx Subject: Re: After reboot, one brick is not being seen by clients On 11/29/2013 04:34 AM, Patrick Haley wrote: > Hi Ravi, > > gluster-data is pingable from gluster-0-0, so I tried the detaching/ > reattaching. I had to use the "force" option on the detach on > gluster-0-0. The first 2 steps seemed to work, however step 3 fails. > > ----------------- > on gluster-0-0 > ----------------- > [root@nas-0-0 ~]# gluster peer probe gluster-data > Probe unsuccessful > Probe returned with unknown errno 107 > > > Now, on gluster-data, gluster isn't seeing the peers > (although it can still ping them): Most likely a firewall issue; you need to clear the iptable rules. This link should help you: http://thr3ads.net/gluster-users/2013/05/2639667-peer-probe-fails-107 > [root@mseas-data ~]# gluster peer status > No peers present > > > [root@mseas-data ~]# ping gluster-0-1 > PING gluster-0-1 (10.1.1.11) 56(84) bytes of data. > 64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=1 ttl=64 time=0.103 ms > 64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=2 ttl=64 time=0.092 ms > 64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=3 ttl=64 time=0.094 ms > > --- gluster-0-1 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2000ms > rtt min/avg/max/mdev = 0.092/0.096/0.103/0.009 ms > > > Any further thoughts? Thanks. > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley@xxxxxxx > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > ________________________________________ > From: Ravishankar N [ravishankar@xxxxxxxxxx] > Sent: Thursday, November 28, 2013 12:32 PM > To: Patrick Haley; gluster-users@xxxxxxxxxxx > Subject: Re: After reboot, one brick is not being seen by clients > > On 11/28/2013 09:30 PM, Patrick Haley wrote: >> Hi Ravi, >> >> I'm pretty sure the clients use fuse mounts. The relevant line from /etc/fstab is >> >> mseas-data:/gdata /gdata glusterfs defaults,_netdev 0 0 >> >> >> gluster-data sees the other bricks as connected. The other bricks see each >> other as connected but gluster-data as disconnected: >> >> --------------- >> gluster-data: >> --------------- >> [root@mseas-data ~]# gluster peer status >> Number of Peers: 2 >> >> Hostname: gluster-0-1 >> Uuid: 393fc4a6-1573-4564-971e-1b1aec434167 >> State: Peer in Cluster (Connected) >> >> Hostname: gluster-0-0 >> Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03 >> State: Peer in Cluster (Connected) >> >> ------------- >> gluster-0-0: >> -------------- >> [root@nas-0-0 ~]# gluster peer status >> Number of Peers: 2 >> >> Hostname: gluster-data >> Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed >> State: Peer in Cluster (Disconnected) >> >> Hostname: gluster-0-1 >> Uuid: 393fc4a6-1573-4564-971e-1b1aec434167 >> State: Peer in Cluster (Connected) >> >> ------------- >> gluster-0-1: >> -------------- >> [root@nas-0-1 ~]# gluster peer status >> Number of Peers: 2 >> >> Hostname: gluster-data >> Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed >> State: Peer in Cluster (Disconnected) >> >> Hostname: gluster-0-0 >> Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03 >> State: Peer in Cluster (Connected) >> >> Does any of this suggest what I need to look at next? > Hi Patrick, > If gluster-data is pingable from the other bricks, you could try > detaching and retttaching it from gluster-0-0 or 0-1. > 1) On gluster-0-0: > `gluster peer detach gluster-data`, if that fails, `gluster peer > detach gluster-data force` > 2) On gluster-data: > `rm -rf /var/lib/glusterd` > `service glusterd restart` > 3) Again on gluster-0-0: > 'gluster peer probe gluster-data' > > Now check if things work. > PS:You should really do a 'reply-to-all' so that your queries reach a > wider audience, getting you faster responses from the community. Also > serves as a double-check in case I goof up :) > > I'm off to sleep now. >> Thanks. >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: phaley@xxxxxxx >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> >> ________________________________________ >> From: Ravishankar N [ravishankar@xxxxxxxxxx] >> Sent: Thursday, November 28, 2013 2:48 AM >> To: Patrick Haley >> Cc: gluster-users@xxxxxxxxxxx >> Subject: Re: After reboot, one brick is not being seen by clients >> >> On 11/28/2013 12:52 PM, Patrick Haley wrote: >>> Hi Ravi, >>> >>> Thanks for the reply. If I interpret the output of gluster volume status >>> correctly, glusterfsd was running >>> >>> [root@mseas-data ~]# gluster volume status >>> Status of volume: gdata >>> Gluster process Port Online Pid >>> ------------------------------------------------------------------------------ >>> Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 >>> Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 >>> Brick gluster-data:/data 24009 Y 2897 >>> NFS Server on localhost 38467 Y 2903 >>> NFS Server on gluster-0-1 38467 Y 7069 >>> NFS Server on gluster-0-0 38467 Y 27012 >>> >>> For completeness, I tried both "service glusterd restart" and >>> "gluster volume start gdata force". Neither solved the problem. >>> Note that after "gluster volume start gdata force" the gluster volume status >>> failed >>> >>> [root@mseas-data ~]# gluster volume status >>> operation failed >>> >>> Failed to get names of volumes >>> >>> Doing another "service glusterd restart" let the "gluster volume status" >>> command work, but the clients still don't see the files on mseas-data. >> Are your clients using fuse mounts or NFS mounts? >>> A second piece of data, on the other bricks, "gluster volume status"does not >>> show gluster-data:/data: >> Hmm, could you check if all 3 bricks are connected ? `gluster peer >> status` on each brick should show the others as connected. >>> [root@nas-0-0 ~]# gluster volume status >>> Status of volume: gdata >>> Gluster process Port Online Pid >>> ------------------------------------------------------------------------------ >>> Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 >>> Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 >>> NFS Server on localhost 38467 Y 27012 >>> NFS Server on gluster-0-1 38467 Y 8051 >>> >>> Any thoughts on what I should look at next? >> Also noticed the NFS server process on gluster-0-1 (on which I guess no >> commands were run ) seems to have changed it's pid from 7069 to 8051. >> FWIW, I am able to observe a similar bug >> (https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be >> investigated. >> >> Thanks, >> Ravi >>> Thanks again. >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley@xxxxxxx >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> >>> >>> ________________________________________ >>> From: Ravishankar N [ravishankar@xxxxxxxxxx] >>> Sent: Wednesday, November 27, 2013 11:21 PM >>> To: Patrick Haley; gluster-users@xxxxxxxxxxx >>> Subject: Re: After reboot, one brick is not being seen by clients >>> >>> On 11/28/2013 03:12 AM, Pat Haley wrote: >>>> Hi, >>>> >>>> We are currently using gluster with 3 bricks. We just >>>> rebooted one of the bricks (mseas-data, also identified >>>> as gluster-data) which is actually the main server. After >>>> rebooting this brick, our client machine (mseas) only sees >>>> the files on the other 2 bricks. Note that if I mount >>>> the gluster filespace (/gdata) on the brick I rebooted, >>>> it sees the entire space. >>>> >>>> The last time I had this problem, there was an error in >>>> one of our /etc/hosts file. This does not seem to be the >>>> case now. >>>> >>>> What else can I look at to debug this problem? >>>> >>>> Some information I have from the gluster server >>>> >>>> [root@mseas-data ~]# gluster --version >>>> glusterfs 3.3.1 built on Oct 11 2012 22:01:05 >>>> >>>> [root@mseas-data ~]# gluster volume info >>>> >>>> Volume Name: gdata >>>> Type: Distribute >>>> Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d >>>> Status: Started >>>> Number of Bricks: 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: gluster-0-0:/mseas-data-0-0 >>>> Brick2: gluster-0-1:/mseas-data-0-1 >>>> Brick3: gluster-data:/data >>>> >>>> [root@mseas-data ~]# ps -ef | grep gluster >>>> >>>> root 2781 1 0 15:16 ? 00:00:00 /usr/sbin/glusterd -p >>>> /var/run/glusterd.pid >>>> root 2897 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfsd >>>> -s localhost --volfile-id gdata.gluster-data.data -p >>>> /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S >>>> /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l >>>> /var/log/glusterfs/bricks/data.log --xlator-option >>>> *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed >>>> --brick-port 24009 --xlator-option gdata-server.listen-port=24009 >>>> root 2903 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfs -s >>>> localhost --volfile-id gluster/nfs -p >>>> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S >>>> /tmp/d5c892de43c28a1ee7481b780245b789.socket >>>> root 4258 1 0 15:52 ? 00:00:00 /usr/sbin/glusterfs >>>> --volfile-id=/gdata --volfile-server=mseas-data /gdata >>>> root 4475 4033 0 16:35 pts/0 00:00:00 grep gluster >>>> [ >>>> >>> From the ps output, the brick process (glusterfsd) doesn't seem to be >>> running on the gluster-data server. Run `gluster volume status` and >>> check if that is indeed the case. If yes, you could either restart >>> glusterd on the brick node (`service glusterd restart`) or restart the >>> entire volume (`gluster volume start gdata force`) which should bring >>> back the brick process online. >>> >>> I'm not sure why glusterd did not start the brick process when you >>> rebooted the machine in the first place. You could perhaps check the >>> glusterd log for clues). >>> >>> Hope this helps, >>> Ravi >>> >>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>>> Pat Haley Email: phaley@xxxxxxx >>>> Center for Ocean Engineering Phone: (617) 253-6824 >>>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>>> 77 Massachusetts Avenue >>>> Cambridge, MA 02139-4301 >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users@xxxxxxxxxxx >>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users