Hi Yevheniy, I am interested in applying this patch for my 2 node clustered configured in RHEL 6.2 with CLVMD + GFS2 + CTDB + CMAN. Can you please guide me on how to download and apply this patch for the said environment. Thanks Sathya Narayanan V Solution Architect M +91 9940680173 |T +91 44 42199500 | Service Desk +91 44 42199521 SERVICE - In PRECISION IT is a PASSION ---------------------------------------------------------------------------- ----------------------------- Precision Infomatic (M) Pvt Ltd 22, 1st Floor, Habibullah Road, T. Nagar, Chennai - 600 017. India. www.precisionit.co.in -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of linux-cluster-request@xxxxxxxxxx Sent: Tuesday, December 27, 2011 10:30 PM To: linux-cluster@xxxxxxxxxx Subject: Linux-cluster Digest, Vol 92, Issue 19 Send Linux-cluster mailing list submissions to linux-cluster@xxxxxxxxxx To subscribe or unsubscribe via the World Wide Web, visit https://www.redhat.com/mailman/listinfo/linux-cluster or, via email, send a message with subject or body 'help' to linux-cluster-request@xxxxxxxxxx You can reach the person managing the list at linux-cluster-owner@xxxxxxxxxx When replying, please edit your Subject line so it is more specific than "Re: Contents of Linux-cluster digest..." Today's Topics: 1. [PATCH] dlm: faster dlm recovery (Yevheniy Demchenko) 2. Re: Corosync memory problem (Steven Dake) ---------------------------------------------------------------------- Message: 1 Date: Mon, 26 Dec 2011 23:52:29 +0100 From: Yevheniy Demchenko <zheka@xxxxxx> To: linux-cluster@xxxxxxxxxx Subject: [PATCH] dlm: faster dlm recovery Message-ID: <4EF8FAAD.50504@xxxxxx> Content-Type: text/plain; charset=ISO-8859-1 Avoid running find_rsb_root by storing last recovered rsb address for each node. Makes dlm recovery much faster for FS with large number of files. Signed-off-by: Yevheniy Demchenko <zheka@xxxxxx> --- Current dlm recovery uses small (4096 bytes) buffer to communicate between dlm_copy_master_names and dlm_directory_recovery. This leads to running find_rsb_root N*32/4096 times, where N - number of locks to recover and 32 - DLM_RESNAME_MAXLEN+1. find_rsb_root itself takes N*c to complete, where c is some constant. Eventually, dlm recovery time is proportional to N*N. For an ocfs2 fs with one directory consisting of 300000 small files every mount on other node takes more than 2.5 minutes and umount more than 5 minutes on a fairly modern HW with 10Gb interconnect. During dlm recovery FS is not available on any node. This patch makes mounts and umounts on non-locking-master nodes to take less than a 2 seconds. It is not limited to ocfs2 and might make dlm recovery faster in general (i.e. for gfs2). Test case: 2 node RHCS cluster, OCFS2 with cman cluster stack. /sys/kernel/config/dlm/cluster/{lkbtbl_size,dirtbl_size,rsbtbl_size} = 16384 on both nodes On node 1: #mkfs.ocfs2 --fs-features=backup-super,sparse,inline-data,extended-slotmap,indexed-dirs, refcount,xattr,usrquota,grpquota,unwritten /dev/vg1/test1 #mount /dev/vg1/test1 /mnt/temp -o noatime,nodiratime #mkdir /mnt/temp/test1 #for i in $(seq 1 300000) ; do dd if=/dev/urandom bs=4096 count=1 of=/mnt/temp/test1/$i ; done #umount /mnt/temp #-----leave dlm and destroy locks #mount /dev/vg1/test1 /mnt/temp -o noatime,nodiratime #time (ls -l /mnt/temp/test1 | wc -l ) #-------create 300000 RR locks on node 1 On node 2: #mount /dev/vg1/test1 /mnt/temp -o noatime,nodiratime #--- dlm recovery starts and takes a looooong time if dlm is not patched #umount /mnt/temp #----- even looooooonger, FS is not available on any node while recovery is running After patching, both operations on node2 take less than a 2 seconds. For now, patch tries to detect inconsistences and reverts to the previous behaviour if there are any. These tests can be dropped together with find_rsb_root and some excessive code in the future. diff -uNr vanilla/fs/dlm/dir.c v1.0/fs/dlm/dir.c --- vanilla/fs/dlm/dir.c 2011-09-29 15:29:00.000000000 +0200 +++ v1.0/fs/dlm/dir.c 2011-12-26 22:00:21.068403493 +0100 @@ -196,6 +196,16 @@ } } +static int nodeid2index (struct dlm_ls *ls, int nodeid) { + int i; + for (i = 0; i < ls->ls_num_nodes ; i++) { + if (ls->ls_node_array[i] == nodeid) + return (i); + } + log_debug(ls, "index not found for nodeid %d", nodeid); + return (-1); +} + int dlm_recover_directory(struct dlm_ls *ls) { struct dlm_member *memb; @@ -375,11 +385,28 @@ struct dlm_rsb *r; int offset = 0, dir_nodeid; __be16 be_namelen; + int index; down_read(&ls->ls_root_sem); + index = nodeid2index(ls, nodeid); + if (inlen > 1) { - r = find_rsb_root(ls, inbuf, inlen); + if ((index > -1) && (ls->ls_recover_last_rsb[index])) { + if (inlen == ls->ls_recover_last_rsb[index]->res_length && + !memcmp(inbuf, ls->ls_recover_last_rsb[index]->res_name, inlen)) { + r = ls->ls_recover_last_rsb[index]; + } else { + /* This should never happen! */ + log_error(ls, "copy_master_names: rsb cache failed 1: node %d: cached rsb %1.31s, needed rsb %1.31s;", nodeid, + ls->ls_recover_last_rsb[index]->res_name, inbuf); + r = find_rsb_root(ls, inbuf, inlen); + } + } else { + /* Left for safety reasons, we should never get here */ + r = find_rsb_root(ls, inbuf, inlen); + log_error(ls, "copy_master_names: rsb cache failed 2: ,searching for %1.31s, node %d", inbuf, nodeid); + } if (!r) { inbuf[inlen - 1] = '\0'; log_error(ls, "copy_master_names from %d start %d %s", @@ -421,6 +448,7 @@ offset += sizeof(__be16); memcpy(outbuf + offset, r->res_name, r->res_length); offset += r->res_length; + ls->ls_recover_last_rsb[index] = r; } /* diff -uNr vanilla/fs/dlm/dlm_internal.h v1.0/fs/dlm/dlm_internal.h --- vanilla/fs/dlm/dlm_internal.h 2011-09-29 15:32:00.000000000 +0200 +++ v1.0/fs/dlm/dlm_internal.h 2011-12-22 23:51:00.000000000 +0100 @@ -526,6 +526,7 @@ int ls_recover_list_count; wait_queue_head_t ls_wait_general; struct mutex ls_clear_proc_locks; + struct dlm_rsb **ls_recover_last_rsb; struct list_head ls_root_list; /* root resources */ struct rw_semaphore ls_root_sem; /* protect root_list */ diff -uNr vanilla/fs/dlm/member.c v1.0/fs/dlm/member.c --- vanilla/fs/dlm/member.c 2011-09-29 15:29:00.000000000 +0200 +++ v1.0/fs/dlm/member.c 2011-12-23 19:55:00.000000000 +0100 @@ -128,6 +128,9 @@ kfree(ls->ls_node_array); ls->ls_node_array = NULL; + + kfree(ls->ls_recover_last_rsb); + ls->ls_recover_last_rsb = NULL; list_for_each_entry(memb, &ls->ls_nodes, list) { if (memb->weight) @@ -146,6 +149,11 @@ array = kmalloc(sizeof(int) * total, GFP_NOFS); if (!array) return; + + ls->ls_recover_last_rsb = kcalloc(ls->ls_num_nodes+1, sizeof(struct dlm_rsb *), GFP_NOFS); + + if (!ls->ls_recover_last_rsb) + return; list_for_each_entry(memb, &ls->ls_nodes, list) { if (!all_zero && !memb->weight) -- Ing. Yevheniy Demchenko Senior Linux Administrator UVT s.r.o. ------------------------------ Message: 2 Date: Tue, 27 Dec 2011 09:00:25 -0700 From: Steven Dake <sdake@xxxxxxxxxx> To: linux clustering <linux-cluster@xxxxxxxxxx> Subject: Re: Corosync memory problem Message-ID: <4EF9EB99.1050001@xxxxxxxxxx> Content-Type: text/plain; charset=ISO-8859-1 On 12/21/2011 11:04 AM, Chris Alexander wrote: > An update in case anyone ever runs into something like this - we had > corosync-notify running on the servers and once we removed that and > restarted the cluster stack, corosync seemed to return to normal. > > Additionally, according to the corosync mailing list, the cluster 1.2.3 > version is basically very similar to (if not the same as) the 1.4 that > they currently have released, someone's been backporting. > The upstream 1.2.3 version hasn't had any backports applied to it. Only the RHEL 1.2.3-z versions have been backported. Regards -steve > Cheers > > Chris > > On 19 December 2011 19:01, Chris Alexander <chris.alexander@xxxxxxxxxx > <mailto:chris.alexander@xxxxxxxxxx>> wrote: > > Hi all, > > You may remember our recent issue, I believe this is being worsened > if not caused by another problem we have encountered. > > Every few days our nodes are (non-simultaneously) being fenced due > to corosync taking up vast amounts of memory (i.e. 100% of the box). > Please see a sample log message, we have several just like this, [1] > which occurs when this happens. Note that it is not always corosync > being killed - but it is clearly corosync eating all the memory (see > top output from three servers at various times since their last > reboot, [2] [3] [4]). > > The corosync version is 1.2.3: > [g@cluster1 ~]$ corosync -v > Corosync Cluster Engine, version '1.2.3' > Copyright (c) 2006-2009 Red Hat, Inc. > > We had a bit of a dig around and there are a significant number of > bugfix updates which address various segfaults, crashes, memory > leaks etc. in this minor as well as subsequent minor versions. [5] [6] > > We're trialling the Fedora 14 (fc14) RPMs for corosync and > corosynclib (v1.4.2) to see if it fixes the particular issue we are > seeing (i.e. whether or not the memory keeps spiralling way out of > control). > > Has anyone else seen an issue like this, and is there any known way > to debug or fix it? If we can assist debugging by providing further > information, please specify what this is (and, if non-obvious, how > to get it). > > Thanks again for your help > > Chris > > [1] http://pastebin.com/CbyERaRT > [2] http://pastebin.com/uk9ZGL7H > [3] http://pastebin.com/H4w5Zg46 > [4] http://pastebin.com/KPZxL6UB > [5] http://rhn.redhat.com/errata/RHBA-2011-1361.html > [6] http://rhn.redhat.com/errata/RHBA-2011-1515.html > > > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster ------------------------------ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster End of Linux-cluster Digest, Vol 92, Issue 19 ********************************************* This communication may contain confidential information. If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information contained within this communication.. Errors and Omissions may occur in the contents of this Email arising out of or in connection with data transmission, network malfunction or failure, machine or software error, malfunction, or operator errors by the person who is sending the email. Precision Group accepts no responsibility for any such errors or omissions. The information, views and comments within this communication are those of the individual and not necessarily those of Precision Group. All email that is sent from/to Precision Group is scanned for the presence of computer viruses, security issues and inappropriate content. However, it is the recipient's responsibility to check any attachments for viruses before use. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster