Re: Gluster NFS crashing

Franco Broi <franco.broi@xxxxxxxxxx> · Thu, 01 May 2014 09:55:38 +0800

Installed 3.4.3 exactly 2 weeks ago on all our brick servers and I'm
happy to report that we've not had a crash since.

Thanks for all the good work.

On Tue, 2014-04-15 at 14:22 +0800, Franco Broi wrote: 
> The whole system came to a grinding halt today and no amount of
> restarting daemons would make it work again. What was really odd was
> that gluster vol status said everything was fine and yet all the client
> mount points had hung.
> 
> On the node that was exporting Gluster NFS I had zombie processes so I
> decided to reboot, took a while for the ZFS JBOD's to sort themselves
> out but I was relieved when it all came back up - except that the df
> size on the clients was wrong...
> 
> gluster vol info and gluster vol status said everything was fine but it
> was obvious that 2 of my bricks were missing. I restarted everything,
> and still 2 missing brick. I remounted the fuse clients and still no
> good.
> 
> Just out of sheer desperation and for no good reason I disabled the
> Gluster NFS export and magically the missing 2 bricks reappeared and the
> filesystem was back to its normal size. I turned NFS exports back on and
> everything stayed working.
> 
> I'm not trying to belittle all the good work done by the Gluster
> developers but this really doesn't look like a viable big data
> filesystem at the moment. We've currently got 800TB and are about to add
> another 400TB but quite honestly the prospect terrifies me.
> 
> 
> On Tue, 2014-04-15 at 08:35 +0800, Franco Broi wrote: 
> > On Mon, 2014-04-14 at 17:29 -0700, Harshavardhana wrote: 
> > > >
> > > > Just distributed.
> > > >
> > > 
> > > Pure distributed setup you have to take a downtime, since the data
> > > isn't replicated.
> > 
> > If I shutdown the server processes, wont the clients just wait for it to
> > come back up? Ie like NFS hard mounts? I don't mind an interruption, I
> > just want to avoid killing all jobs that are currently accessing the
> > filesystem if at all possible, our users have suffered a lot recently
> > with filesystem outages.
> > 
> > By the way, how does one shutdown the glusterfs processes without
> > stopping a volume? It would be nice to have a quiesce or freeze option
> > that just stalls all access while maintenance takes place.
> > 
> > > 
> > > >>
> > > >> > 3.4.1 to 3.4.3-3 shouldn't cause problems with existing clients and
> > > >> > other servers, right?
> > > >> >
> > > >>
> > > >> You mean 3.4.1 and 3.4.3 co-existent with in a cluster?
> > > >
> > > > Yes, at least for the duration of the upgrade.
> > > 
> > > Yeah 3.4.x series is backward compatible to each other in any case.
> > > 
> > 
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users