At the risk of being yet one more techie who thinks he has a workaround... I'm back (in the past two months) doing Cyrus administration after a three year break. I ran Cyrus instance at Duke University before, and am now getting up to speed to run the one at UNC. At Duke we started as a multi-host install, and moved to a single instance just as I was leaving. Here at UNC, we've been on a single instance for years. Both places have been Solaris all along, and both places had over 50k users and receiving several million messages a day. Part of the way we handle it here is with massive hardware -- An 8 processor Sun 6800 with the processor boards swapped out to UltraSparc 4s. These are still a couple of years old at this point. That said, our CPU load is really pretty minimal. While we're on a very old version of Cyrus right now (1.6), I think reading this that I've got a good feel for what you're looking at. There's been a lot of talk about the linked list in the kernel and the fact that it freezes all processes with that file mmap'ed when the file gets written. If the spanning of the linked list were really the problem, I think we would have seen a total system meltdown here a long time ago. I'm much more inclined to think that what you're running into is all of the processes freezing during the latency period for the re-write of the mailboxes file. This won't show up as I/O blocking on your disk, as there won't be any real contingency for that file or even for the channel. But the latency of the write, while only a few milliseconds, is going to kill you if your mailboxes file gets big. I haven't had any role yet in the design and configuration of UNC's system, but there's one thing we have that I think saves us an enormous amount of pain. Since we're still on 1.6, and hence using the "plain text" mailboxes format, bear in mind that all changes to the mailboxes database involve a lock on the file, a complete rewrite of the file next to it on the file system, and a rename() system call. This is SLOOOWWW. How are we not dead? Solid state disk for the partition with the mailboxes database. This thing is amazing. We've got one of the gizmos with a battery backup and a RAID array of Winchester disks that it writes off to if it loses power, but the latency levels on this thing are non-existent. Writes to the mailboxes database return almost instantaneously when compared to regular spinning disks. Based on my experience, that's bound to be a much bigger chunk of time than traversing a linked list in kernel memory. For anyone doing a big Cyrus install, I would strongly recommend this. Michael BaconITS - UNC Chapel Hill --On Friday, November 09, 2007 10:35 AM -0800 Vincent Fox <vbfox@xxxxxxxxxxx> wrote: > Jure Pečar wrote:>> In my expirience the "brick wall" you describe is what happens when disks>> reach a certain point of random IO that they cannot keep up with.>>>> The problem with a technical audience, is that everyone thinks they have> a workaround> or probable fix you haven't already thought of. No offense. I am guilty> of it myself but> it's very hard to sometimes say "I DON'T KNOW" and dig through telemetry> and instrument the software until you know all the answers.>> With something as complex as Cyrus, this is harder than you think.> Unfortunately when it comes to something like a production mail service> these days it's nearly impossible to get the funding and manhours and> approvals to run experiments on live guinea pigs to really get to the> bottom of problems. We throw systems at the problem and move on.>> But in answer to your point, our iostat numbers for busy or service time> didn't> indicate there to be any I/O issue. That was the first thing we looked> at of course.> Even by eyeball our array drives are more idle than busy.>>> ----> Cyrus Home Page: http://cyrusimap.web.cmu.edu/> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html ----Cyrus Home Page: http://cyrusimap.web.cmu.edu/Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twikiList Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html