Hi, >>> I enabled replication between two servers with version 2.4.10 cyrus. >>> I set the option for the rolling replication, and it works fine but >>> obviously I have a high CPU load. >>> Unfortunately after 10 minutes of running processes pop3d increasing >>> from 50 to over 200, making the server unusable for customers. >>> Can you tell me why this increase is abnormal? >> Can you use something like 'top' to work out which processes are >> consuming most of the CPU time? Thanks for the screenshots. > This screenshot of top before : > > http://www.digicolor.net/cyrus/img1.jpg This shows a load average of around 1. That means that, at any given point in time over the past 1 and 5 minutes, 1 process has been waiting in the run queue, ready to go. This therefore not an entirely idle machine. I see you're running a nameserver and a few other things: it looks they've been busy on the CPU but not excessively so. What's more worrying is the 4.5% of CPU time spent "waiting". This time is accrued when processes are unable to run due to outstanding IO. > and after 20 minutes of rooling replication : > > http://www.digicolor.net/cyrus/img5.jpg This shows a load average of around 7 and 30% of CPU time spent in iowait. This machine does not seem to be managing well with the IO load of rolling replication. >> Can you use something like 'vmstat 1' to show us how much I/O there is >> on the system? > > This screenshot of top before : > > http://www.digicolor.net/cyrus/img3.jpg This shows a system that is not reading anything from disk (bi). A small number of blocks are being written out to disk (bo). Each line represents activity for a period of 1 second, as specified by the parameter to 'vmstat'. > and after 20 minutes of rooling replication : > > http://www.digicolor.net/cyrus/img6.jpg This system is writing to disk but it's very choppy. Sometimes it's getting 7,000 blocks out per second and other times it's only 1,000. Depending on your block size, this probably represents only a few MB per second. The last column shows iowait CPU percentage.. and it's rather large. What IO subsystem do you have on this machine? What filesystem are you using? The IO on this machine appears to be struggling significantly. I did a quick test on my laptop. I have a 2.5", 7,200rpm 200GB disk. I ran this in my home directory to cause every file to be read from disk: ----- $ find -type f | xargs cat > /dev/null ----- 'vmstat 1' gives lines like this: ----- procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 1680 30600 272 2575688 0 0 12376 576 2141 3424 3 6 47 43 1 1 1680 32396 272 2576736 0 0 29988 0 1882 3632 5 7 47 41 0 1 1680 33868 272 2576660 0 0 46820 0 2304 4443 4 8 48 39 1 0 1680 33416 272 2578600 0 0 36716 0 2067 3733 3 7 48 42 0 1 1680 34000 272 2581944 0 0 50432 0 1164 2983 3 6 50 42 0 1 1680 31876 272 2585320 0 0 46464 64 1223 2964 3 8 49 40 1 1 1680 30288 272 2588672 0 0 51712 0 1380 3658 3 7 46 43 0 1 1680 29836 272 2590552 0 0 59776 0 1288 3549 4 7 47 42 0 1 1680 30324 272 2592948 0 0 58368 0 1287 3568 2 7 49 41 1 1 1680 30308 272 2593108 0 0 12800 18 917 1673 2 2 49 46 ----- They're an order of magnitude greater than what you're seeing. As you can see, I drop a few bi when I start to do bo but that's because I've only got a single spindle. Please can you run the same test? Can you track the source of all those writes in img3? Please can you tell us more about the type of machine you are trying to run this on? Thanks for the info and screenshots so far. >> Are most of the pop3d processes sleeping in iowait? >> Do you use any other servers such as the impad? > Yes I have imapd > > This is screenshot of pstree before : > > http://www.digicolor.net/cyrus/img2.jpg > > > This is screenshot of pstree after : > > http://www.digicolor.net/cyrus/img4.jpg Regards, @ndy -- andyjpb@xxxxxxxxxxxxxx http://www.ashurst.eu.org/ 0x7EBA75FF ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/