Hi! Last Friday, one of the nodes in our Cyrus cluster got stuck, with apparently only some parts of the network layer up (answered to ping, carried around the RedHat cluster token). The Cyrus services were down for ~20 minutes before being started in an another node. What surprised me was the behaviour of the timsieved and lmtpd proxies in our Murder frontends. When the backend failed, the proxies with an open connection there got stuck, too. And there were many of them created; so many, in fact, that the limit of lmtpd / timsieved processes was reached. (I'm still not sure how that happened, since we certainly didn't have that many simultaneous sieve sessions going on at that time. LMTP sessions I could almost believe; the amount of email traffic here is considerable.) However, the proxies remained stuck. On Friday, I tried to do some investigation, and apparently, they were stuck on a read on the TCP socket. As I didn't think of anything else to do, I killed the lmtpd proxies (normally, that is, with signal 15), and that got the lmtp service running again (cyrus master on the frontend was able to create new lmtpd processes again). But I noticed the stuck Sieve processes only today; they'd been stuck on their sockets since Friday. I wonder why the read apparently never times out? I'm sorry I cannot provide any more exact data than this. My first priority was to get our Cyrus installation up and running. --Janne -- Janne Peltonen <janne.peltonen@xxxxxxxxxxx> PGP Key ID: 0x9CFAC88B Please consider membership of the Hospitality Club (http://www.hospitalityclub.org) ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/