GFS2 locking in a VM based cluster (KVM)

"C.D." <ccd.stoy.ml@xxxxxxxxx> · Thu, 17 Mar 2011 17:17:53 +0200

Hello,

sorry guys to resurrect an old thread, but I have to say I can confirm that, too. I have a libvirt setup with multipathed FC SAN devices and KVM guests running on top of it. The physical machine is HP 465c G7 (2 x 12 Core Magny-Cours with 96GB RAM). The host OS is Fedora 14. The guests are Scientific Linux 6. With gfs2 10GB shared LUN I can manage ~600k plocks/sec while both machines mounted the LUN. I started: ping_pong some_file 3 on one of the VMs and got those 600k plocks. Then I started ping_pong the_same_file 3 on the second machines and got around 360 plocks/sec (that is 360, not 360 000). No matter what I tried I couldn't optimize it. If I stop the ping_pong on one of the VMs the plocks wen't up to around 500-550 plocks/sec (again 550 not 550k). Stopping the process. Waiting a while and starting again on a single machine still got me around 600k plocks. This I could reproduce both with tcp and sctp and tried bunch of different settings.

Then I decided to give ocfs2 a change. Compiling the module on SL6, and I suppose on RHEL6, is not the most straight forward taks, buth half an hour later I got the module compiled from the sources of the EL kernel. Stripped all debug symbols. Copied the ocfs2 kernel module dir to both VM machines. Did depmod -a, I set up the oracle fs on top of the same LUN. Used ping_pong the_same_file_i_used_in_the_first_test 3 on just one machine, while both VMs have mounted the LUN. 1600k plocks/sec (as in ~1 600 000 ). Started ping_pong on the second host. The plocks did not move at all. Still 1600k plocks/sec. Tested with the real life app. It worked very well, unlike gfs2, which was painfully slow with just 2 users. I created the ocfs2 with -T mail, I didn't do any tuning on it, either.

I'm not trying to bash gfs2, actually I would definitely prefer it over ocfs2 anytime, however it seems it doesn't work well with VM for some reason. I have used both mtu 1500 and 9000 also, it just didn't make any diffence, no matter what I have tried.I haven't tested the same setup on top of two physical nodes, but I have the feeling it will work just as good as ocfs2 on the VMs. I didn't test with hugepages for the VMs, but I somehow doubt that would make much of a difference.

I think this should be investigates by someone at RH possibly because they are the driving force behind both KVM, libvirt, the cluster soft and gfs2.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster