Re: UC Davis Cyrus Incident September 2007

David Lang <david.lang@xxxxxxxxxxxxxxxxxx> · Wed, 17 Oct 2007 15:07:06 -0700 (PDT)

------------ Omen Wild (University of California Davis)
The root problem seems to be an interaction between Solaris' concept of
global memory consistency and the fact that Cyrus spawns many processes
that all memory map (mmap) the same file.  Whenever any process updates
any part of a memory mapped file, Solaris freezes all of the processes
that have that file mmaped, updates their memory tables, and then
re-schedules the processes to run.  When we have problems we see the load
average go extremely high and no useful work gets done by Cyrus.  Logins
get processed by saslauthd, but listing an inbox either takes a long
time or completely times out.

Apparently AIX also runs into this issue.  I talked to one email
administrator that had this exact issue under AIX.  That admin talked to
the kernel engineers at IBM who explained that this is a feature, not a
bug.  They eventually switched to Linux which solved their issues,
although they did move to more Linux boxes with fewer users per box.

Oh man... Horrible memories just flood right back... Wow.  I was reading
your e-mail and thinking to myself that this sounded like the same problem
we had.  Then I got to the above section and *bam*, there it was...  We
had significant problems with our e-mail last year (this year was a perfect
start!) a week before students came back.  We didn't resolve the problems
until the end of September and we were dismayed at our final solution.

We run Tru64 5.1b on a 4 member cluster.  Tru64's kernel suffers from the
same exact issue as described above.  We have regularly 12,000 cyrus procs
running at any one time during the day, and that cluster also receives on
average 300k-500k e-mails each day (that is after spam/virus work).

What was finally identified was that the number of "processes" that were
mapped to that single physical "executable" (/usr/cyrus/imapd) was causing
a lot of lock contention in the kernel.  The executable would have a link
list of all the processes running off of it in kernel memory.  When one
of the processes would go away, the kernel would start at the beginning
of the list and search for the process in order to clean up its resources.
During that time, the kernel would lock everything and execution would
essentially stop for everything (basically, the whole system appeared to
simply freeze on us).  The kernel would reach a time threshold and stop
in order to let other things happen (unfreeze).  This time was very short,
but if we had a lot of processes going away in a very short period of time,
we would noticeably see the freeze, since the kernel was going into this
lock-down mode a lot in a very short period of time.  That is a simplified
view of what really happened.

could someone whip up a small test that could be used to check different 
operating systems (and filesystems) for this concurrancy problem?

it doesn't even need to use any cyrus code, (in fact it would probably be better 
if it didn't)

it sounds like there are a couple different aspects to check

1. large number of copies of a single program running, find the impact of 
starting and stopping a process
1a. single process that forks lots of copies
1b. master process that execs lots of copies

2. large number of processes mmapping a single file.
2a. impact to add or remove a process from this group
2b. impact on modifying this file

personally I expect 1b and 1a to be significantly different on different OSs. 
some OSs will gain huge memory savings in 1a due to copy-on-write savings (and 
to partially account for this it may be worth making the program allocate a 
chink of ram and write to it after the fork), while on other OSs the overhead of 
multiple mappings of a page will dominate.

David Lang
--On Tuesday, October 16, 2007 3:39 PM -0700 Vincent Fox <vbfox@xxxxxxxxxxx> wrote:

------------ Omen Wild (University of California Davis)
The root problem seems to be an interaction between Solaris' concept of
global memory consistency and the fact that Cyrus spawns many processes
that all memory map (mmap) the same file.  Whenever any process updates
any part of a memory mapped file, Solaris freezes all of the processes
that have that file mmaped, updates their memory tables, and then
re-schedules the processes to run.  When we have problems we see the load
average go extremely high and no useful work gets done by Cyrus.  Logins
get processed by saslauthd, but listing an inbox either takes a long
time or completely times out.

Apparently AIX also runs into this issue.  I talked to one email
administrator that had this exact issue under AIX.  That admin talked to
the kernel engineers at IBM who explained that this is a feature, not a
bug.  They eventually switched to Linux which solved their issues,
although they did move to more Linux boxes with fewer users per box.

Oh man... Horrible memories just flood right back... Wow.  I was reading
your e-mail and thinking to myself that this sounded like the same problem
we had.  Then I got to the above section and *bam*, there it was...  We
had significant problems with our e-mail last year (this year was a perfect
start!) a week before students came back.  We didn't resolve the problems
until the end of September and we were dismayed at our final solution.

We run Tru64 5.1b on a 4 member cluster.  Tru64's kernel suffers from the
same exact issue as described above.  We have regularly 12,000 cyrus procs
running at any one time during the day, and that cluster also receives on
average 300k-500k e-mails each day (that is after spam/virus work).

What was finally identified was that the number of "processes" that were
mapped to that single physical "executable" (/usr/cyrus/imapd) was causing
a lot of lock contention in the kernel.  The executable would have a link
list of all the processes running off of it in kernel memory.  When one
of the processes would go away, the kernel would start at the beginning
of the list and search for the process in order to clean up its resources.
During that time, the kernel would lock everything and execution would
essentially stop for everything (basically, the whole system appeared to
simply freeze on us).  The kernel would reach a time threshold and stop
in order to let other things happen (unfreeze).  This time was very short,
but if we had a lot of processes going away in a very short period of time,
we would noticeably see the freeze, since the kernel was going into this
lock-down mode a lot in a very short period of time.  That is a simplified
view of what really happened.

HP recommends that we keep the linked list down to only a few hundred
processes at most.  They were working on a kernel patch to make it a hash
instead of a linked list in the kernel, but as they got deeper into the
making of this patch, they found that it impacts a lot more than they
initially realized.  The last I heard, this might make it into the PK7
patch release, which is likely sometime next year.

Meanwhile, we hacked around this in a very cool way.  We copied the imapd
process 60 times (assuming average of 12,000 processes, shooting for 200
processes per executable, that is 60 individual executables).  These were
named /usr/cyrus/bin/imapd_001 through /usr/cyrus/bin/imapd_060.  We then
symlinked the "imapd" binary to imapd_001.  We then wrote a cron job that
ran once a minute and relinked the imapd symlink to the next numbered
executable, rotating around to imapd_001 when the end was reached.  This
worked like a charm and *all* of our problems went away... In fact, our
system has continued to get busier and we are still running pretty good.

I don't think the hack is ideal, but man, does it work!

Scott
--
+-----------------------------------------------------------------------+
     Scott W. Adkins               Work (740)593-9478 Fax (740)593-1944
  UNIX Systems Engineer                 <mailto:adkinss@xxxxxxxx>
+-----------------------------------------------------------------------+
    PGP Public Key <http://edirectory.ohio.edu/?$search?uid=adkinss>
Attachment:
pgpr2eu6FXIOC.pgp

Description: PGP signature
----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html