Hi Ravi, gluster-data is pingable from gluster-0-0, so I tried the detaching/ reattaching. I had to use the "force" option on the detach on gluster-0-0. The first 2 steps seemed to work, however step 3 fails. ----------------- on gluster-0-0 ----------------- [root@nas-0-0 ~]# gluster peer probe gluster-data Probe unsuccessful Probe returned with unknown errno 107 Now, on gluster-data, gluster isn't seeing the peers (although it can still ping them): [root@mseas-data ~]# gluster peer status No peers present [root@mseas-data ~]# ping gluster-0-1 PING gluster-0-1 (10.1.1.11) 56(84) bytes of data. 64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=1 ttl=64 time=0.103 ms 64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=2 ttl=64 time=0.092 ms 64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=3 ttl=64 time=0.094 ms --- gluster-0-1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.092/0.096/0.103/0.009 ms Any further thoughts? Thanks. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley@xxxxxxx Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 ________________________________________ From: Ravishankar N [ravishankar@xxxxxxxxxx] Sent: Thursday, November 28, 2013 12:32 PM To: Patrick Haley; gluster-users@xxxxxxxxxxx Subject: Re: After reboot, one brick is not being seen by clients On 11/28/2013 09:30 PM, Patrick Haley wrote: > Hi Ravi, > > I'm pretty sure the clients use fuse mounts. The relevant line from /etc/fstab is > > mseas-data:/gdata /gdata glusterfs defaults,_netdev 0 0 > > > gluster-data sees the other bricks as connected. The other bricks see each > other as connected but gluster-data as disconnected: > > --------------- > gluster-data: > --------------- > [root@mseas-data ~]# gluster peer status > Number of Peers: 2 > > Hostname: gluster-0-1 > Uuid: 393fc4a6-1573-4564-971e-1b1aec434167 > State: Peer in Cluster (Connected) > > Hostname: gluster-0-0 > Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03 > State: Peer in Cluster (Connected) > > ------------- > gluster-0-0: > -------------- > [root@nas-0-0 ~]# gluster peer status > Number of Peers: 2 > > Hostname: gluster-data > Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed > State: Peer in Cluster (Disconnected) > > Hostname: gluster-0-1 > Uuid: 393fc4a6-1573-4564-971e-1b1aec434167 > State: Peer in Cluster (Connected) > > ------------- > gluster-0-1: > -------------- > [root@nas-0-1 ~]# gluster peer status > Number of Peers: 2 > > Hostname: gluster-data > Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed > State: Peer in Cluster (Disconnected) > > Hostname: gluster-0-0 > Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03 > State: Peer in Cluster (Connected) > > Does any of this suggest what I need to look at next? Hi Patrick, If gluster-data is pingable from the other bricks, you could try detaching and retttaching it from gluster-0-0 or 0-1. 1) On gluster-0-0: `gluster peer detach gluster-data`, if that fails, `gluster peer detach gluster-data force` 2) On gluster-data: `rm -rf /var/lib/glusterd` `service glusterd restart` 3) Again on gluster-0-0: 'gluster peer probe gluster-data' Now check if things work. PS:You should really do a 'reply-to-all' so that your queries reach a wider audience, getting you faster responses from the community. Also serves as a double-check in case I goof up :) I'm off to sleep now. > > Thanks. > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley@xxxxxxx > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > ________________________________________ > From: Ravishankar N [ravishankar@xxxxxxxxxx] > Sent: Thursday, November 28, 2013 2:48 AM > To: Patrick Haley > Cc: gluster-users@xxxxxxxxxxx > Subject: Re: After reboot, one brick is not being seen by clients > > On 11/28/2013 12:52 PM, Patrick Haley wrote: >> Hi Ravi, >> >> Thanks for the reply. If I interpret the output of gluster volume status >> correctly, glusterfsd was running >> >> [root@mseas-data ~]# gluster volume status >> Status of volume: gdata >> Gluster process Port Online Pid >> ------------------------------------------------------------------------------ >> Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 >> Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 >> Brick gluster-data:/data 24009 Y 2897 >> NFS Server on localhost 38467 Y 2903 >> NFS Server on gluster-0-1 38467 Y 7069 >> NFS Server on gluster-0-0 38467 Y 27012 >> >> For completeness, I tried both "service glusterd restart" and >> "gluster volume start gdata force". Neither solved the problem. >> Note that after "gluster volume start gdata force" the gluster volume status >> failed >> >> [root@mseas-data ~]# gluster volume status >> operation failed >> >> Failed to get names of volumes >> >> Doing another "service glusterd restart" let the "gluster volume status" >> command work, but the clients still don't see the files on mseas-data. > Are your clients using fuse mounts or NFS mounts? >> A second piece of data, on the other bricks, "gluster volume status"does not >> show gluster-data:/data: > Hmm, could you check if all 3 bricks are connected ? `gluster peer > status` on each brick should show the others as connected. >> [root@nas-0-0 ~]# gluster volume status >> Status of volume: gdata >> Gluster process Port Online Pid >> ------------------------------------------------------------------------------ >> Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 >> Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 >> NFS Server on localhost 38467 Y 27012 >> NFS Server on gluster-0-1 38467 Y 8051 >> >> Any thoughts on what I should look at next? > Also noticed the NFS server process on gluster-0-1 (on which I guess no > commands were run ) seems to have changed it's pid from 7069 to 8051. > FWIW, I am able to observe a similar bug > (https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be > investigated. > > Thanks, > Ravi >> Thanks again. >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: phaley@xxxxxxx >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> >> ________________________________________ >> From: Ravishankar N [ravishankar@xxxxxxxxxx] >> Sent: Wednesday, November 27, 2013 11:21 PM >> To: Patrick Haley; gluster-users@xxxxxxxxxxx >> Subject: Re: After reboot, one brick is not being seen by clients >> >> On 11/28/2013 03:12 AM, Pat Haley wrote: >>> Hi, >>> >>> We are currently using gluster with 3 bricks. We just >>> rebooted one of the bricks (mseas-data, also identified >>> as gluster-data) which is actually the main server. After >>> rebooting this brick, our client machine (mseas) only sees >>> the files on the other 2 bricks. Note that if I mount >>> the gluster filespace (/gdata) on the brick I rebooted, >>> it sees the entire space. >>> >>> The last time I had this problem, there was an error in >>> one of our /etc/hosts file. This does not seem to be the >>> case now. >>> >>> What else can I look at to debug this problem? >>> >>> Some information I have from the gluster server >>> >>> [root@mseas-data ~]# gluster --version >>> glusterfs 3.3.1 built on Oct 11 2012 22:01:05 >>> >>> [root@mseas-data ~]# gluster volume info >>> >>> Volume Name: gdata >>> Type: Distribute >>> Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d >>> Status: Started >>> Number of Bricks: 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: gluster-0-0:/mseas-data-0-0 >>> Brick2: gluster-0-1:/mseas-data-0-1 >>> Brick3: gluster-data:/data >>> >>> [root@mseas-data ~]# ps -ef | grep gluster >>> >>> root 2781 1 0 15:16 ? 00:00:00 /usr/sbin/glusterd -p >>> /var/run/glusterd.pid >>> root 2897 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfsd >>> -s localhost --volfile-id gdata.gluster-data.data -p >>> /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S >>> /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l >>> /var/log/glusterfs/bricks/data.log --xlator-option >>> *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed >>> --brick-port 24009 --xlator-option gdata-server.listen-port=24009 >>> root 2903 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfs -s >>> localhost --volfile-id gluster/nfs -p >>> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S >>> /tmp/d5c892de43c28a1ee7481b780245b789.socket >>> root 4258 1 0 15:52 ? 00:00:00 /usr/sbin/glusterfs >>> --volfile-id=/gdata --volfile-server=mseas-data /gdata >>> root 4475 4033 0 16:35 pts/0 00:00:00 grep gluster >>> [ >>> >> From the ps output, the brick process (glusterfsd) doesn't seem to be >> running on the gluster-data server. Run `gluster volume status` and >> check if that is indeed the case. If yes, you could either restart >> glusterd on the brick node (`service glusterd restart`) or restart the >> entire volume (`gluster volume start gdata force`) which should bring >> back the brick process online. >> >> I'm not sure why glusterd did not start the brick process when you >> rebooted the machine in the first place. You could perhaps check the >> glusterd log for clues). >> >> Hope this helps, >> Ravi >> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley@xxxxxxx >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@xxxxxxxxxxx >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users