Hi, On Tue, 2009-10-20 at 02:22 -0700, Steven Dake wrote: > On Tue, 2009-10-20 at 10:07 +0100, Steven Whitehouse wrote: > > Hi, > > > > On Mon, 2009-10-19 at 16:30 -0600, Kai Meyer wrote: > > > Ok, so our lab test results have turned up some fun events. > > > > > > Firstly, we were able to duplicate the invalid metadata block exactly > > > under the following circumstances: > > > > > > We wanted to monkey with the VLan that fenced/openais ran on. We failed > > > miserably causing all three of my test nodes to believe that they became > > > lone islands in the cluster, unable to get enough votes themselves to > > > fence anybody. So we chose to simply power cycle the nodes with out > > > trying to gracefully leave the cluster or reboot (they are diskless > > > servers with NFS root filesystems so the GFS2 filesystem is the only > > > thing we were risking corruption with.) After the nodes came back > > > online, we began to see the same random reboots and filesystem withdraws > > > within 24 hours. The filesystem taht went into production that > > > eventually hit these errors was likely not reformatted just before > > > putting into production, and I believe it is highly likely that the last > > > format done on that production filesystem was done while we were still > > > doing testing. I hope that as we continue in our lab, we can reproduce > > > the same circumstances, and give you a step-by-step that will cause this > > > issue. It'll make me feel much better about our current GFS2 filesystem > > > that was created and unmounted cleanly by a single node, and then put > > > straight into production, and has been only mounted once by our current > > > production servers since it was formatted. > > > > > That very interesting information. We are not there yet, but there are a > > number of useful hints in that. Any further information you are able to > > gather would be very interesting. > > > > > Secondly, the way our VMs are doing I/O, we have found the cluster.conf > > > configuration settings: > > > <dlm plock_ownership="1" plock_rate_limit="0"/> > > > <gfs_controld plock_rate_limit="0"/> > > > have lowered our %wa times from ~60% to ~30% utilization. I am curious > > > why the locking deamon is set to default to such a low number by default > > > (100). Adding these two parameters in the cluster.conf raised our locks > > > per second with the ping_pong binary from 93 to 3000+ in our 5 node > > > cluster. Our throughput doesn't seem to improve by either upping the > > > locking limit or setting up jumbo frames, but processes spend much less > > > time in I/O wait state than before (if my munin graphs are believable). > > > How likely is it that the low locking rate had a hand in causing the > > > filesystem withdraws and 'invalid metadata block' errors? > > > > > I think there would be an argument for setting the default rate limit to > > 0 (i.e. off) since we seem to spend so much time telling people to turn > > off this particular feature. The reason that it was added is that under > > certain circumstances it is possible to flood the network with plock > > requests resulting in the blocking of openais traffic (so the cluster > > thinks its been partitioned). > > > > I've not seen or heard of any recent reports of this, though, but that > > is the original reason the feature was added. Most applications tend to > > be I/O bound rather than (fcntl) lock bound anyway, so that the chances > > of it being a problem are fairly slim. > > > > The reason the limiting was added was because the IPC system in original > openais in fc6/rhel5.0 would disconnect heavy users of ipc connections, > triggering a fencing operation of the node. That problem has been > resolved since 5.3.z (also f11+). > In which case there would seem to be no argument about setting the default to disabled now then. > > Setting jumbo frames won't help as the issue is one of latency rather > > than throughput (performance-wise). Using a low-latency interconnect in > > the cluster should help fcntl lock performance though. > > > > jumbo frames reduces latency AND increases throughput from origination > to delivery for heavy message traffic. For very light message traffic > latency is increased but throughput is still improved. > Yes, but in reality the traffic is unlikely to be very heavy in terms of total bandwidth as there is a lot of "send message, wait for reply" type traffic, Steve. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster