Re: gfs2 kernel versions

Jürgen Ladstätter <info@xxxxxxxxxxxxxxxxxx> · Fri, 8 Nov 2013 09:35:30 +0100

Hi Bob,

thanks for the links to your tools. I'm going to try them asap. Am I right that I need debugfs to be enabled for those tools to work?
Since you're so involved with this filesystem you can possibly answer me this question so that we don't need any further testing: right now we're thinking about growing the cluster in terms of diskspace (iscsi connection). Right now it's about 3TB and we want to grow it by another 3TB.
When there are many locks, we see that the dlm_controld uses up to 20% of cpu power and the file system access rate drops dramatically, causing the nodes to increase their load to 130 because of the io wait time.
Since we want to grow the disk space, we don't want to make the system unstable or unusable because of all those waiting times. Does it make a difference if we make the new 3TB partition a new iscsi target and therefore a new gfs2 filesystem, or will higher iowaits / locktimes from the first iscsi target also have an impact on the new iscsi target? Another big question here is if the dlm_controld scales good enough to separate those two different targets?

Another weird behavior is the one with many files in a single directory. We have a directory with about 100.000 pictures in it (100gb of data), it takes nearly forever to do something like "ls" or even worse "ls -la" and the load explodes on all nodes. Is there some kind of known limitation with many files in a single directory?

Do you have any clue on when you're going to release the next kernel version? Since centos kinda sticks to rhel kernel cycles this would give us some hint on when to expect improvements. The last kernel 2.6.32-358.el6 is  from 2013-02-21 and not useable due to severe bugs that cause node fencing and filesys revoking - so we're using 2.6.32-279.22.1.el6.x86_64 now, which seems quite old and lacking a lot of features

Thanks again, Jürgen

-----Ursprüngliche Nachricht-----
Von: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] Im Auftrag von Bob Peterson
Gesendet: Mittwoch, 06. November 2013 14:39
An: linux clustering
Betreff: Re:  gfs2 kernel versions

----- Original Message -----
| Hi Bob,
| 
| first of all, thank you for your amazing work.
| 
| Do you include any kind of versioning with your releases so that we 
| can check what gfs2 version is running on our Gentoo with 3.1 kernel, 
| and what version is running on 2.6.32 kernel on centos?
| The PHP processes hanging in D state are kinda annoying and it's not 
| possible to use the latest centos kernel due to severe crashes in certain conditions.
| 
| Since I'm very familiar with kernels (Gentoo requires that you make 
| your own), I'm pretty sure that we can build and use a regular 
| mainstream kernel provided by kernel.org - it looks like there is also 
| much development going on by you and Mr. Whitehouse.
| 
| You say that " The more recent the version, the better and faster GFS2 
| should be" - do you mean the kernel version or GFS2 version? If the 
| later, how can we find out what version we're running?
| 
| Thanks in advance,
| Juergen

Hi Jürgen,

There aren't really any version markers to identify which patches are in which kernel. With the RHEL, Centos and Fedora versions, you can get the kernel version and trace that back to the tags in the source git repository. In other words, if you have a kernel version 2.6.32-358.20.1.el6, you can go back to the source repository and look through the commit messages to figure out what's in there.
Aside from that, it's hard to tell what patches are where. It's not straightforward. 

Debugging performance problems are a challenge, and there are many many variables to look at. If it's a straight-up hang, we have tools like my "gfs2_hangalyzer"
tool on my people page.

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/gfs2_hangalyzer.c

If it's not a true hang, but just slowness, you can check for GFS2 lock contention between the nodes with another tool I wrote: glocktop.c (same directory).
The glocktop tool is like "top" in that it shows what glocks are being waited for, and their status. If it's a directory, it will even give you the directory name.
The tool does its job by taking glock dumps and extracting the ones on which there are processes waiting. Before you run it, you should make sure your version of
GFS2 has the patch for "faster glock dumps". With that patch, a glock dump should take less than a second. Without it, a glock dump can take a very long time.
(A glock dump being the same as catting /sys/kernel/debug/gfs2/<lock table name>/glocks)

If it's not glock contention, it could be many things. You just have to go through all the possibilities and see where the bottlenecks are.

Yes, lots of development going on, still. :)

When I spoke about using the most recent code, I meant the most recent GFS2 code, which is usually coupled to a given kernel.

Regards,

Bob Peterson
Red Hat File Systems

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster