Hi Jürgen, ----- Original Message ----- | Hi Bob, | | thanks for the links to your tools. I'm going to try them asap. Am I right | that I need debugfs to be enabled for those tools to work? Yes, they rely on debugfs being mounted. I've never seen debugfs impact performance, even slightly, so it should not cause a performance problem. | Since you're so involved with this filesystem you can possibly answer me this | question so that we don't need any further testing: right now we're thinking | about growing the cluster in terms of diskspace (iscsi connection). Right | now it's about 3TB and we want to grow it by another 3TB. | When there are many locks, we see that the dlm_controld uses up to 20% of cpu | power and the file system access rate drops dramatically, causing the nodes | to increase their load to 130 because of the io wait time. I'm involved in the gfs2 kernel code and the gfs2-utils. The dlm_controld is something I've not looked at, except for an occasional glance. I think it manages posix locking, among other things. When gfs2 gets a posix lock request, we hand it off to dlm, which hands it off to user space, corosync, etc. I haven't studied this path much. On the other hand, flocks are handled between GFS2 and DLM, without user space getting involved. So I think the answer is: it all depends on the workload and the application. The GFS2 kernel code itself wouldn't be impacted much by increasing the size of the file system. I can't predict what dlm_controld does, but it's based entirely on your workload and its use of posix locks. | Since we want to grow the disk space, we don't want to make the system | unstable or unusable because of all those waiting times. Does it make a | difference if we make the new 3TB partition a new iscsi target and therefore | a new gfs2 filesystem, or will higher iowaits / locktimes from the first | iscsi target also have an impact on the new iscsi target? Another big | question here is if the dlm_controld scales good enough to separate those | two different targets? Again, it depends on what the work load does. | Another weird behavior is the one with many files in a single directory. We | have a directory with about 100.000 pictures in it (100gb of data), it takes | nearly forever to do something like "ls" or even worse "ls -la" and the load | explodes on all nodes. Is there some kind of known limitation with many | files in a single directory? We've always known that the more files you have in a single directory, the slower things get, because things like ls -la will stat every inode, which requires locking. If you don't stat the inode (just a plain 'ls' for example) it should be fairly fast. So we've always recommended breaking up large directories into several smaller ones in order to get better performance. It's kind of the nature of all clustered file systems, due to the necessary cooperation between the nodes. | Do you have any clue on when you're going to release the next kernel version? | Since centos kinda sticks to rhel kernel cycles this would give us some hint | on when to expect improvements. The last kernel 2.6.32-358.el6 is from | 2013-02-21 and not useable due to severe bugs that cause node fencing and | filesys revoking - so we're using 2.6.32-279.22.1.el6.x86_64 now, which | seems quite old and lacking a lot of features The kernel release schedule is out of my control in the RHEL space, and I won't try to predict it. Red Hat spins lots of kernels all the time, for our own internal testing, and those kernels are what I use in my daily work. I don't often need to know exactly when a particular kernel is released to customers. When Centos picks up the official releases is up to them entirely; I have no clue what they do, or when. Regards, Bob Peterson Red Hat File Systems -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster