Re: GFS2 with IMAP Maildir server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sounds like you are running into the same bug that I ran into with GFS2 on a similar setup nearly 2 years ago, except I could produce a lock-up in under 2 seconds every time. Solution is to use GFS1 if you really want to stick with that setup, but bear in mind that, regardless of the cluster file system (GFS1, GFS2, OCFS2) the performance will scale _inversely_. Cluster file systems really don't work well with millions of small files.

You might, instead, want to look into something like DBMail with a MySQL proxy to serialize all writes to a single node.

You can, of course, still use GFS1 for the root file system to share the OS install. Look at Open Shared Root project if this is of interest.

Gordan

Flavio Junior wrote:
Hi folks....

I'm (trying to) using GFS2 with a mailserver scenario using:

- CentOS 5.3 updated
- Dovecot IMAP/Maildir
- Postfix

To make servers active/active i'm using CTDB (http://ctdb.samba.org).

Some info that could be relevant:
[root@pinky ~]# uname -a
Linux pinky 2.6.18-128.1.16.el5 #1 SMP Tue Jun 30 06:07:26 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@pinky ~]# rpm -qa | grep -E 'gfs2|clust|kernel|cman|openais'
kernel-2.6.18-128.1.16.el5
gfs2-utils-0.1.53-1.el5_3.3
modcluster-0.12.1-2.el5.centos
cluster-cim-0.12.1-2.el5.centos
kernel-devel-2.6.18-128.1.10.el5
openais-0.80.3-22.el5_3.8
system-config-cluster-1.0.55-1.0
kernel-2.6.18-128.1.6.el5
kernel-2.6.18-128.1.10.el5
kernel-devel-2.6.18-128.1.16.el5
lvm2-cluster-2.02.40-7.el5
cluster-snmp-0.12.1-2.el5.centos
kernel-headers-2.6.18-128.1.16.el5
kernel-devel-2.6.18-128.1.6.el5
cman-2.0.98-1.el5_3.4
[root@pinky ~]# grep /home /etc/fstab
/dev/homeClusterVG/home_vmail /home gfs2 auto,noatime,quota=off,noexec,nodev,_netdev 0 0


Everything works fine for some time, but two or three times by day I get some dovecot/deliver process hanged D state, so the only way to solve it is rebooting node.

I'm not a developer and don't know much about debugging. As i've got other problems ago I learn to use "sysrq-t" and here is the output related with two of these process:

Pastebin: http://pastebin.ca/1483264

Jul 3 15:45:20 cerebro kernel: deliver D ffff81007e442800 0 24420 23846 (NOTLB) Jul 3 15:45:20 cerebro kernel: ffff810013885e08 0000000000000082 ffff810013885d68 0000000000000092 Jul 3 15:45:20 cerebro kernel: ffff810013885e20 0000000000000001 ffff8100141870c0 ffff81000904b0c0 Jul 3 15:45:20 cerebro kernel: 0000052a72ff2a70 000000000000034a ffff8100141872a8 000000036caf5000
Jul  3 15:45:20 cerebro kernel: Call Trace:
Jul 3 15:45:20 cerebro kernel: [<ffffffff88562a7d>] :dlm:dlm_posix_lock+0x172/0x210 Jul 3 15:45:20 cerebro kernel: [<ffffffff8009eba4>] autoremove_wake_function+0x0/0x2e Jul 3 15:45:20 cerebro kernel: [<ffffffff88591c7a>] :gfs2:gfs2_lock+0xc3/0xcf Jul 3 15:45:20 cerebro kernel: [<ffffffff8003a39e>] fcntl_setlk+0x11e/0x273 Jul 3 15:45:20 cerebro kernel: [<ffffffff800b5659>] audit_syscall_entry+0x16e/0x1a1
Jul  3 15:45:20 cerebro kernel:  [<ffffffff8002ea66>] sys_fcntl+0x269/0x2dc
Jul  3 15:45:20 cerebro kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0


Jul 3 15:45:21 cerebro kernel: deliver D ffff81000238f480 0 1358 32225 (NOTLB) Jul 3 15:45:21 cerebro kernel: ffff8100086cfe08 0000000000000082 ffff8100086cfd68 0000000000000092 Jul 3 15:45:21 cerebro kernel: ffff8100086cfe20 0000000000000001 ffff81000904b0c0 ffff81007ff28100 Jul 3 15:45:21 cerebro kernel: 0000052a72ff2ca2 0000000000000232 ffff81000904b2a8 000000037ed68a00
Jul  3 15:45:21 cerebro kernel: Call Trace:
Jul 3 15:45:21 cerebro kernel: [<ffffffff88562a7d>] :dlm:dlm_posix_lock+0x172/0x210 Jul 3 15:45:21 cerebro kernel: [<ffffffff8009eba4>] autoremove_wake_function+0x0/0x2e Jul 3 15:45:21 cerebro kernel: [<ffffffff88591c7a>] :gfs2:gfs2_lock+0xc3/0xcf Jul 3 15:45:21 cerebro kernel: [<ffffffff8003a39e>] fcntl_setlk+0x11e/0x273 Jul 3 15:45:21 cerebro kernel: [<ffffffff800b5659>] audit_syscall_entry+0x16e/0x1a1
Jul  3 15:45:21 cerebro kernel:  [<ffffffff8002ea66>] sys_fcntl+0x269/0x2dc
Jul  3 15:45:21 cerebro kernel:  [<ffffffff8005e28d>] tracesys+0xd5/0xe0


Before reboot the node I went into the directory of this user and run some "ls" and everything works as expected. I was pretty sure that command will hang, but it don't.
Here is the "ps ax" output:
cicero 24420 0.0 0.0 8960 1220 ? Ds 14:46 0:00 /usr/libexec/dovecot/deliver -f cicero -d cicero

I've already rebooted that node, but if there is someway more deeply to perform a debug of this case, just let me know that probably till the end of the day i'll get same situation.


Thanks in advance.

--

Flávio do Carmo Júnior aka waKKu


------------------------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux