Re: GFS2 fatal: invalid metadata block

Steven Dake <sdake@xxxxxxxxxx> · Tue, 20 Oct 2009 02:22:41 -0700

On Tue, 2009-10-20 at 10:07 +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Mon, 2009-10-19 at 16:30 -0600, Kai Meyer wrote:
> > Ok, so our lab test results have turned up some fun events.
> > 
> > Firstly, we were able to duplicate the invalid metadata block exactly 
> > under the following circumstances:
> > 
> > We wanted to monkey with the VLan that fenced/openais ran on. We failed 
> > miserably causing all three of my test nodes to believe that they became 
> > lone islands in the cluster, unable to get enough votes themselves to 
> > fence anybody. So we chose to simply power cycle the nodes with out 
> > trying to gracefully leave the cluster or reboot (they are diskless 
> > servers with NFS root filesystems so the GFS2 filesystem is the only 
> > thing we were risking corruption with.) After the nodes came back 
> > online, we began to see the same random reboots and filesystem withdraws 
> > within 24 hours. The filesystem taht went into production that 
> > eventually hit these errors was likely not reformatted just before 
> > putting into production, and I believe it is highly likely that the last 
> > format done on that production filesystem was done while we were still 
> > doing testing. I hope that as we continue in our lab, we can reproduce 
> > the same circumstances, and give you a step-by-step that will cause this 
> > issue. It'll make me feel much better about our current GFS2 filesystem 
> > that was created and unmounted cleanly by a single node, and then put 
> > straight into production, and has been only mounted once by our current 
> > production servers since it was formatted.
> > 
> That very interesting information. We are not there yet, but there are a
> number of useful hints in that. Any further information you are able to
> gather would be very interesting.
> 
> > Secondly, the way our VMs are doing I/O, we have found the cluster.conf 
> > configuration settings:
> > <dlm plock_ownership="1" plock_rate_limit="0"/>
> > <gfs_controld plock_rate_limit="0"/>
> > have lowered our %wa times from ~60% to ~30% utilization. I am curious 
> > why the locking deamon is set to default to such a low number by default 
> > (100). Adding these two parameters in the cluster.conf raised our locks 
> > per second with the ping_pong binary from 93 to 3000+ in our 5 node 
> > cluster. Our throughput doesn't seem to improve by either upping the 
> > locking limit or setting up jumbo frames, but processes spend much less 
> > time in I/O wait state than before (if my munin graphs are believable). 
> > How likely is it that the low locking rate had a hand in causing the 
> > filesystem withdraws and 'invalid metadata block' errors?
> > 
> I think there would be an argument for setting the default rate limit to
> 0 (i.e. off) since we seem to spend so much time telling people to turn
> off this particular feature. The reason that it was added is that under
> certain circumstances it is possible to flood the network with plock
> requests resulting in the blocking of openais traffic (so the cluster
> thinks its been partitioned).
> 
> I've not seen or heard of any recent reports of this, though, but that
> is the original reason the feature was added. Most applications tend to
> be I/O bound rather than (fcntl) lock bound anyway, so that the chances
> of it being a problem are fairly slim.
> 

The reason the limiting was added was because the IPC system in original
openais in fc6/rhel5.0 would disconnect heavy users of ipc connections,
triggering a fencing operation of the node.  That problem has been
resolved since 5.3.z (also f11+).

> Setting jumbo frames won't help as the issue is one of latency rather
> than throughput (performance-wise). Using a low-latency interconnect in
> the cluster should help fcntl lock performance though.
> 

jumbo frames reduces latency AND increases throughput from origination
to delivery for heavy message traffic.  For very light message traffic
latency is increased but throughput is still improved.

> The locking rate should have no bearing on the filesystem itself. The
> locking (fcntl only this refers to, btw) is performed in userspace by
> dlm_controld (gfs_controld on older clusters) and merely passed through
> the filesystem. The fcntl code is identical between gfs1 and gfs2.
> 
> > I'm still not completely confident I won't see this happen again on my 
> > production servers. I'm hoping you can help me with that.
> > 
> Yes, so am I :-) It sounds like we are making progress if we can reduce
> the search space for the problem and it sounds very much from your
> message as if you believe that it is a recovery issue, and it sounds
> plausible to me,
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster