On Sat, 2007-03-03 at 14:23 +1100, Rob Mueller wrote: > %util - Percentage of CPU time during which I/O requests were issued to the > device (bandwidth utilization for the device). Device saturation occurs when > this value is close to 100%. Can values way above 100% be trusted? If so, it's pretty bad (this is from a situation where there are 200 lmtp processes, which is the current limit I set): avg-cpu: %user %nice %system %iowait %idle 2.53 0.00 5.26 89.98 2.23 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util etherd/e0.0 0.00 0.00 5.87 235.02 225.10 2513.77 112.55 1256.88 11.37 0.00 750.32 750.32 18074.51 avg-cpu: %user %nice %system %iowait %idle 1.72 0.00 3.73 94.45 0.10 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util etherd/e0.0 0.00 0.00 4.44 140.73 317.74 1125.00 158.87 562.50 9.94 0.00 2500.46 2500.46 36296.94 > The other thing of interest would be the load on the machine, and processes > in D state. Load average tends to get really high. It starts increasing really fast after the number of lmtpd processes reaches the limit set in cyrus.conf, and can easily get to 150 or 200. One of the moments where the problem becomes significant is when our MTAs run their deferred queue. We have around a dozen MTAs, and when they all run their queues, there is an increase in the number of connections to lmtpd. While these are very quick on our other mailboxes, in this one they take a lot of time to finish, and most of the times I have to restart cyrus, because it never reduces the amount of processes again, and thus connections start being refused. The difference between the two kinds of servers are: - The ones that don't have the problem use local disks instead of AoE - The ones that don't have the problem are limited to 2000 domains (around 8000 accounts), while the one using the AoE storage serves 4000 domains (around 20000 accounts). Anyone running cyrus with that many accounts? > ps auxw | grep -v ' S' root 1743 0.0 0.0 0 0 ? D Mar01 0:05 [xfssyncd] root 3116 0.0 0.0 0 0 ? D Mar01 0:01 [xfssyncd] cyrus 15593 0.0 0.3 36288 13660 ? D 11:48 0:00 imapd cyrus 16360 0.0 0.3 37752 14360 ? D 11:54 0:00 imapd cyrus 17161 0.0 0.3 36304 13648 ? D 11:59 0:00 imapd cyrus 17182 0.0 0.0 120736 3268 ? D 12:00 0:00 lmtpd cyrus 17891 0.0 0.0 120872 3108 ? D 12:04 0:00 lmtpd cyrus 17897 0.0 0.0 120696 3312 ? D 12:04 0:00 lmtpd cyrus 18265 0.0 0.0 120896 3540 ? D 12:07 0:00 lmtpd cyrus 18302 0.0 0.0 120760 3432 ? D 12:07 0:00 lmtpd cyrus 18336 0.0 0.0 120720 2684 ? D 12:07 0:00 lmtpd cyrus 18441 0.0 0.0 120684 2944 ? D 12:08 0:00 lmtpd cyrus 18590 0.0 0.0 120920 3156 ? D 12:09 0:00 lmtpd cyrus 18591 0.0 0.0 120724 2584 ? D 12:09 0:00 lmtpd cyrus 18592 0.0 0.0 121332 2796 ? D 12:09 0:00 lmtpd cyrus 18612 0.0 0.0 120716 3224 ? D 12:09 0:00 lmtpd cyrus 18613 0.0 0.0 120716 3140 ? D 12:09 0:00 lmtpd cyrus 18632 0.0 0.0 120696 3072 ? D 12:09 0:00 lmtpd cyrus 18641 0.0 0.0 120676 2864 ? D 12:09 0:00 lmtpd cyrus 18643 0.0 0.0 120720 2696 ? D 12:09 0:00 lmtpd cyrus 18656 0.0 0.0 120692 3340 ? D 12:09 0:00 lmtpd cyrus 18657 0.0 0.0 120676 2996 ? D 12:09 0:00 lmtpd cyrus 18658 0.0 0.0 120716 2804 ? D 12:09 0:00 lmtpd cyrus 18669 0.0 0.0 120680 2812 ? D 12:09 0:00 lmtpd cyrus 18671 0.0 0.0 120716 2712 ? D 12:09 0:00 lmtpd cyrus 18939 0.0 0.0 120692 2732 ? D 12:11 0:00 lmtpd cyrus 18941 0.0 0.0 120716 3148 ? D 12:11 0:00 lmtpd cyrus 18942 0.0 0.0 120752 2924 ? D 12:11 0:00 lmtpd cyrus 18944 0.0 0.0 120704 2612 ? D 12:11 0:00 lmtpd cyrus 18947 0.0 0.0 120688 2676 ? D 12:11 0:00 lmtpd cyrus 18948 0.0 0.0 120688 2336 ? D 12:11 0:00 lmtpd cyrus 18950 0.0 0.0 120684 2920 ? D 12:11 0:00 lmtpd cyrus 18951 0.0 0.0 124080 2764 ? D 12:11 0:00 lmtpd cyrus 18978 0.0 0.0 120712 3304 ? D 12:11 0:00 lmtpd cyrus 18979 0.0 0.0 120740 2872 ? D 12:11 0:00 lmtpd cyrus 19014 0.0 0.0 120712 2656 ? D 12:11 0:00 lmtpd cyrus 19016 0.0 0.0 120708 2880 ? D 12:11 0:00 lmtpd cyrus 19089 0.0 0.0 120692 2596 ? D 12:12 0:00 lmtpd cyrus 19123 0.0 0.3 36240 13540 ? D 12:12 0:00 imapd cyrus 19153 0.0 0.0 38012 3076 ? D 12:12 0:00 pop3d cyrus 19179 0.0 0.0 120812 2660 ? D 12:12 0:00 lmtpd cyrus 19183 0.0 0.0 120712 2924 ? D 12:12 0:00 lmtpd cyrus 19199 0.0 0.0 120696 2644 ? D 12:12 0:00 lmtpd cyrus 19200 0.0 0.0 120712 3236 ? D 12:12 0:00 lmtpd cyrus 19201 0.0 0.0 120692 2668 ? D 12:12 0:00 lmtpd cyrus 19263 0.0 0.0 122076 2836 ? D 12:13 0:00 lmtpd cyrus 19292 0.0 0.0 120712 2672 ? D 12:13 0:00 lmtpd cyrus 19298 0.0 0.0 121168 2764 ? D 12:13 0:00 lmtpd cyrus 19329 0.0 0.0 120796 2716 ? D 12:13 0:00 lmtpd cyrus 19338 0.0 0.0 120696 2524 ? D 12:13 0:00 lmtpd cyrus 19344 0.0 0.0 36536 3308 ? D 12:13 0:00 imapd cyrus 19372 0.0 0.0 121688 2640 ? D 12:13 0:00 lmtpd cyrus 20020 0.0 0.0 35940 2952 ? D 12:17 0:00 pop3d cyrus 20495 0.0 0.0 35936 2488 ? D 12:20 0:00 pop3d root 20629 0.0 0.0 2764 820 pts/0 R+ 12:21 0:00 ps auxw Thanks for the help, Andre ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html