Hi Steve, > > I’ve tuned the demote_secs down from 300 to 20 seconds on the > > assumption that file locking is causing an issue. > That is unlikely to make any meaningful change and in fact it could well > hurt performance, depending on the workload. > > <gfs_controld plock_ownership="1" plock_rate_limit="0"/> > > > Try turning off plock_ownership and see if that fixes the problem We'll give this a go and see what it does. We did manage to track down the latest issue to a bad script that the customer had written which caused one of the nodes to exhaust all of its available memory. That then caused a knock-on effect to the lock_dlm process which was unable to drop it's file locks, which then rolled the affect on to the rest of the cluster as they started being unable to open files. > There are two things to look at. One is back traces from processes (echo > 't' > proc/sysrq-trigger) and the other is the glock dump > from /sys/kernel/debug/fs/gfs2/glocks. The first tells us what is > hanging and the second (hopefully) why. Look for glocks with 'W' in the > flags field (f:) for their holders (H:) and it should be possible to > correlate them with the processes which are stuck. Thanks for the above, that's really useful > Do you get any messages in the syslog? Sadly not. I'm just looking at this page; http://manpages.ubuntu.com/manpages/karmic/man8/gfs_controld.8.html and for a webserver, or a group of webservers, with a large amount of files comprising the website itself is it worth increasing the drop_resources_time value so that file locks are flushed faster? Thanks Gavin Gavin Conway Senior Engineer, Operations (Systems Group), UKSolutions Telephone: 0845 004 1333, option 2 Email: gavin.conway@xxxxxxxxxxxxxxxxx Web: http://www.uksolutions.co.uk/ UKS Ltd, Birmingham Road, Studley, Warwickshire, B80 7BG Registered in England Number 3036806 This email must be read in conjunction with the legal & service notices on http://www.uksolutions.co.uk/disclaimer.html -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster