Alohá!
First off: Sorry für crossposting this but I think it hast to do with
both, cyrus and SASL.
Second off: Beware, long post! (Although a good bit is just logs) :-)
Abstract:
Cyrus or sasl choke after a few POP3 logins following quickly after one
another despite high prefork values.
The Problem:
A relatively aggressive POP3 retrieval plugin for exchange (GFI
Mailessentials) polls 65 accounts at the same time. The logins for the
first few accounts succeed but the remaining tries time out without any
trace, leaving the TCP connection in a CLOSE_WAIT state.
The Test:
Using getmail from another sarge machine to pop 65 blank accounts in one
go I can reproduce the success of approx. 7 logins within the first
second, the remaining take close to the timeout limit or do time out.
Please see the timestamps of the following log:
epsilon:~# tail -f /var/log/mail.log |grep testuser
Oct 13 23:48:16 localhost cyrus/pop3d[18228]: login:
[IP_of_querying_server] testuser1 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[21595]: login:
[IP_of_querying_server] testuser2 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[25583]: login:
[IP_of_querying_server] testuser3 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[14381]: login:
[IP_of_querying_server] testuser4 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[1144]: login:
[IP_of_querying_server] testuser5 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[11874]: login:
[IP_of_querying_server] testuser6 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[21571]: login:
[IP_of_querying_server] testuser7 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[27250]: login:
[IP_of_querying_server] testuser8 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[5194]: login:
[IP_of_querying_server] testuser9 plaintext
Oct 13 23:48:17 localhost cyrus/pop3d[19623]: login:
[IP_of_querying_server] testuser10 plaintext
Oct 13 23:49:15 localhost cyrus/pop3d[12990]: login:
[IP_of_querying_server] testuser11 plaintext
Oct 13 23:49:37 localhost cyrus/pop3d[19480]: login:
[IP_of_querying_server] testuser12 plaintext
Oct 13 23:49:37 localhost cyrus/pop3d[5285]: login:
[IP_of_querying_server] testuser13 plaintext
Oct 13 23:51:32 localhost cyrus/pop3d[22249]: login:
[IP_of_querying_server] testuser14 plaintext
Oct 13 23:51:32 localhost cyrus/pop3d[35]: login:
[IP_of_querying_server] testuser15 plaintext
Oct 13 23:52:37 localhost cyrus/pop3d[30733]: login:
[IP_of_querying_server] testuser16 plaintext
Oct 13 23:52:54 localhost cyrus/pop3d[13558]: login:
[IP_of_querying_server] testuser17 plaintext
Oct 13 23:52:54 localhost cyrus/pop3d[4925]: login:
[IP_of_querying_server] testuser18 plaintext
Oct 13 23:53:17 localhost cyrus/pop3d[20686]: login:
[IP_of_querying_server] testuser19 plaintext
Oct 13 23:53:33 localhost cyrus/pop3d[21112]: login:
[IP_of_querying_server] testuser20 plaintext
Oct 13 23:53:59 localhost cyrus/pop3d[24909]: login:
[IP_of_querying_server] testuser21 plaintext
Oct 13 23:54:00 localhost cyrus/pop3d[16212]: login:
[IP_of_querying_server] testuser22 plaintext
Oct 13 23:54:22 localhost cyrus/pop3d[23280]: login:
[IP_of_querying_server] testuser23 plaintext
Oct 13 23:55:42 localhost cyrus/pop3d[22569]: login:
[IP_of_querying_server] testuser24 plaintext
Oct 13 23:55:42 localhost cyrus/pop3d[1855]: login:
[IP_of_querying_server] testuser25 plaintext
Oct 13 23:56:12 localhost cyrus/pop3d[11290]: login:
[IP_of_querying_server] testuser26 plaintext
Oct 13 23:56:52 localhost cyrus/pop3d[1508]: login:
[IP_of_querying_server] testuser27 plaintext
Oct 13 23:56:52 localhost cyrus/pop3d[11878]: login:
[IP_of_querying_server] testuser28 plaintext
Oct 13 23:57:16 localhost cyrus/pop3d[8838]: login:
[IP_of_querying_server] testuser29 plaintext
Oct 13 23:58:18 localhost cyrus/pop3d[11535]: login:
[IP_of_querying_server] testuser30 plaintext
Oct 13 23:58:18 localhost cyrus/pop3d[20806]: login:
[IP_of_querying_server] testuser31 plaintext
Oct 13 23:58:48 localhost cyrus/pop3d[15110]: login:
[IP_of_querying_server] testuser32 plaintext
Oct 13 23:59:20 localhost cyrus/pop3d[27262]: login:
[IP_of_querying_server] testuser33 plaintext
Oct 13 23:59:53 localhost cyrus/pop3d[28168]: login:
[IP_of_querying_server] testuser34 plaintext
Oct 13 23:59:53 localhost cyrus/pop3d[5339]: login:
[IP_of_querying_server] testuser35 plaintext
Oct 14 00:01:18 localhost cyrus/pop3d[4315]: login:
[IP_of_querying_server] testuser36 plaintext
Oct 14 00:01:57 localhost cyrus/pop3d[8924]: login:
[IP_of_querying_server] testuser37 plaintext
Oct 14 00:01:57 localhost cyrus/pop3d[29483]: login:
[IP_of_querying_server] testuser38 plaintext
Oct 14 00:03:29 localhost cyrus/pop3d[9070]: login:
[IP_of_querying_server] testuser39 plaintext
Oct 14 00:03:30 localhost cyrus/pop3d[29472]: login:
[IP_of_querying_server] testuser40 plaintext
Oct 14 00:04:16 localhost cyrus/pop3d[6504]: login:
[IP_of_querying_server] testuser41 plaintext
Oct 14 00:04:16 localhost cyrus/pop3d[27329]: login:
[IP_of_querying_server] testuser42 plaintext
Oct 14 00:04:38 localhost cyrus/pop3d[32760]: login:
[IP_of_querying_server] testuser43 plaintext
Oct 14 00:05:00 localhost cyrus/pop3d[11606]: login:
[IP_of_querying_server] testuser44 plaintext
Oct 14 00:05:35 localhost cyrus/pop3d[26602]: login:
[IP_of_querying_server] testuser45 plaintext
Oct 14 00:05:35 localhost cyrus/pop3d[5887]: login:
[IP_of_querying_server] testuser46 plaintext
Oct 14 00:06:17 localhost cyrus/pop3d[2847]: login:
[IP_of_querying_server] testuser47 plaintext
Oct 14 00:06:36 localhost cyrus/pop3d[8207]: login:
[IP_of_querying_server] testuser48 plaintext
Oct 14 00:07:41 localhost cyrus/pop3d[4035]: login:
[IP_of_querying_server] testuser49 plaintext
Oct 14 00:07:42 localhost cyrus/pop3d[19858]: login:
[IP_of_querying_server] testuser50 plaintext
Oct 14 00:08:39 localhost cyrus/pop3d[3667]: login:
[IP_of_querying_server] testuser51 plaintext
Oct 14 00:08:39 localhost cyrus/pop3d[25244]: login:
[IP_of_querying_server] testuser52 plaintext
Oct 14 00:09:06 localhost cyrus/pop3d[29777]: login:
[IP_of_querying_server] testuser53 plaintext
Oct 14 00:09:37 localhost cyrus/pop3d[842]: login:
[IP_of_querying_server] testuser54 plaintext
Oct 14 00:10:27 localhost cyrus/pop3d[5103]: login:
[IP_of_querying_server] testuser55 plaintext
Oct 14 00:11:03 localhost cyrus/pop3d[28995]: login:
[IP_of_querying_server] testuser56 plaintext
Oct 14 00:11:34 localhost cyrus/pop3d[12469]: login:
[IP_of_querying_server] testuser57 plaintext
Oct 14 00:11:34 localhost cyrus/pop3d[25816]: login:
[IP_of_querying_server] testuser58 plaintext
Oct 14 00:12:01 localhost cyrus/pop3d[29592]: login:
[IP_of_querying_server] testuser59 plaintext
Oct 14 00:12:31 localhost cyrus/pop3d[28422]: login:
[IP_of_querying_server] testuser60 plaintext
Oct 14 00:13:10 localhost cyrus/pop3d[18671]: login:
[IP_of_querying_server] testuser61 plaintext
Oct 14 00:13:10 localhost cyrus/pop3d[23530]: login:
[IP_of_querying_server] testuser62 plaintext
Oct 14 00:13:36 localhost cyrus/pop3d[18888]: login:
[IP_of_querying_server] testuser63 plaintext
Oct 14 00:14:08 localhost cyrus/pop3d[5330]: login:
[IP_of_querying_server] testuser64 plaintext
Oct 14 00:14:08 localhost cyrus/pop3d[15592]: login:
[IP_of_querying_server] testuser65 plaintext
The Thesis:
Ummmm, is sasldb maybe not suited for a high amount of simultaneous
requests and a race condition occurs that is resolved only by timeouts?
Then again we do have 50-150 logins per minute under normal
circumstances - and that's no problem whatsoever.
The setup:
debian sarge cyrus 2.1.18-1+sarge2
epsilon:~/tmp# grep -v \# /etc/cyrus.conf | grep .
START {
recover cmd="/usr/sbin/ctl_cyrusdb -r"
delprune cmd="/usr/sbin/ctl_deliver -E 3"
tlsprune cmd="/usr/sbin/tls_prune"
}
SERVICES {
imap cmd="imapd -U 30" listen="imap" prefork=75
maxchild=250 maxforkrate=0
imaps cmd="imapd -s -U 30" listen="imaps" prefork=5
maxchild=250 maxforkrate=0
pop3 cmd="pop3d -U 30" listen="pop3" prefork=75
maxchild=250 maxforkrate=0
pop3s cmd="pop3d -s -U 30" listen="pop3s" prefork=1
maxchild=250 maxforkrate=0
lmtpunix cmd="lmtpd" listen="/var/run/cyrus/socket/lmtp"
prefork=0 maxchild=200
sieve cmd="timsieved" listen="localhost:sieve"
prefork=0 maxchild=500
notify cmd="notifyd"
listen="/var/run/cyrus/socket/notify" proto="udp" prefork=1
}
EVENTS {
checkpoint cmd="/usr/sbin/ctl_cyrusdb -c" period=30
delprune cmd="/usr/sbin/ctl_deliver -E 3" at=0401
tlsprune cmd="/usr/sbin/tls_prune" at=0401
squatter cmd="squatter -r *" period=30
}
interesting bits from imapd.conf:
popminpoll: 1
duplicatesuppression: yes
hashimapspool: true
allowplaintext: yes
sasl_mech_list: PLAIN LOGIN
sasl_minimum_layer: 256
sasl_maximum_layer: 256
sasl_pwcheck_method: auxprop
sasl_auxprop_plugin: sasldb
sasl_auto_transition: no
Also noteworthy:
Postfix authenticates users sending mail via rimap (saslauthd) thus also
accessing sasldb. Not at this time of day though, we haven't had a
single mail sent in the past 15 min and hardly any pickups either.
Looking at the imapd.conf just now does make me wonder why sasl even
allows pop3 plaintext logins without TLS with sasl_minimum_layer set to
256... Then again I never really understood how the sasl-layer figures
are calculated, does anyone have a hint for a FM I could R? The manpage
for imapd.conf doesn't really tell.
As You can see from the timestamps this is ruining my weekend already,
I'm at the brink of kicking the server really really hard and any input
would be greatly appreciated. If I can provide any more info...
TIA
Martin