Re: GFS2 locking in a VM based cluster (KVM)

"C.D." <ccd.stoy.ml@xxxxxxxxx> · Mon, 21 Mar 2011 21:19:25 +0200

On Mon, Mar 21, 2011 at 1:22 PM, Steven Whitehouse <swhiteho@xxxxxxxxxx> wrote:

Hi,

On Thu, 2011-03-17 at 17:17 +0200, C.D. wrote:

> Hello,

>

> sorry guys to resurrect an old thread, but I have to say I can confirm

> that, too. I have a libvirt setup with multipathed FC SAN devices and

> KVM guests running on top of it. The physical machine is HP 465c G7 (2

> x 12 Core Magny-Cours with 96GB RAM). The host OS is Fedora 14. The

> guests are Scientific Linux 6. With gfs2 10GB shared LUN I can manage

> ~600k plocks/sec while both machines mounted the LUN. I started:

> ping_pong some_file 3 on one of the VMs and got those 600k plocks.

> Then I started ping_pong the_same_file 3 on the second machines and

> got around 360 plocks/sec (that is 360, not 360 000). No matter what I

> tried I couldn't optimize it. If I stop the ping_pong on one of the

> VMs the plocks wen't up to around 500-550 plocks/sec (again 550 not

> 550k). Stopping the process. Waiting a while and starting again on a

> single machine still got me around 600k plocks. This I could reproduce

> both with tcp and sctp and tried bunch of different settings.

>

That is expected, since when only one node is using the plocks then the

lock will be kept locally and will thus be very fast. I assume that you

have plock ownership turned on in the cluster.conf?

from my cluster.conf:

ÂÂÂ <dlm plock_ownership="1" plock_rate_limit="0"/>
ÂÂÂ <gfs_controld plock_rate_limit="0" />

Â

As soon as you introduce the second node, this will no longer be the

case, and the net result is that it will take a lot longer to grant the

lock.

> Then I decided to give ocfs2 a change. Compiling the module on SL6,

> and I suppose on RHEL6, is not the most straight forward taks, buth

> half an hour later I got the module compiled from the sources of the

> EL kernel. Stripped all debug symbols. Copied the ocfs2 kernel module

> dir to both VM machines. Did depmod -a, I set up the oracle fs on top

> of the same LUN. Used ping_pong the_same_file_i_used_in_the_first_test

> 3 on just one machine, while both VMs have mounted the LUN. 1600k

> plocks/sec (as in ~1 600 000 ). Started ping_pong on the second host.

> The plocks did not move at all. Still 1600k plocks/sec. Tested with

> the real life app. It worked very well, unlike gfs2, which was

> painfully slow with just 2 users. I created the ocfs2 with -T mail, I

> didn't do any tuning on it, either.

>

Unless you are using OCFS2 with the RHCS cluster suite, it does not

support clustered fcntl locks. As a result you are probably measuring

the speed of local fcntl locking, not clustered locking.
I am not, using it with RHCS, and I deduced the fact that I'm measuring the locking of the fs on individual nodes, yet, the FS seems to be working all right (it's not insanely fast, but it performs sufficiently for this setup.

Â

What tuning did you do on GFS2? What options do you have in cluster.conf

relating to fcntl locks?
i fiddled with quota turned off, noatime, nodiratime some options under /sys but I couldn't achieve good results. Actually couldn't achieve any results that were different from those ~360 locks/sec. The "feel" of the served files by the webserver through gfs2 and ocfs2 is night and day, though. So even though I'm not measuring the ping_pongs of ocfs2 with clustered fcntl locks, it seems it performs significantly better inside VMs, or this particular VMs, whichever of the two is the more correct statement.

I'm currently in the process of setting up 5 VMs on two different physical hosts and I'm going the OCFS2 route, however I could spend sometime testing with gfs2 if you point me in the right direction. I could also try to test ocfs2 with RHCS, but I suppose the result will not be that much different as I suspect the reason is exactly the locking manager of cluster tools.

> I'm not trying to bash gfs2, actually I would definitely prefer it

> over ocfs2 anytime, however it seems it doesn't work well with VM for

> some reason. I have used both mtu 1500 and 9000 also, it just didn't

> make any diffence, no matter what I have tried.I haven't tested the

> same setup on top of two physical nodes, but I have the feeling it

> will work just as good as ocfs2 on the VMs. I didn't test with

> hugepages for the VMs, but I somehow doubt that would make much of a

> difference.

>

> I think this should be investigates by someone at RH possibly because

> they are the driving force behind both KVM, libvirt, the cluster soft

> and gfs2.

> --

> Linux-cluster mailing list

> Linux-cluster@xxxxxxxxxx

> https://www.redhat.com/mailman/listinfo/linux-cluster

The mtu is unlikely to make much of a difference. With locking the most

important aspect is latency, rather than throughput,

If it makes any difference both VMs were running on the same physical host with bridged network so I suppose the latency is sufficient? Actually MTU seems to increase latency, but not that much to be significant, I think.

Â

Steve.

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster