Re: Does anyone else see skiplist recovery errors?

"Robert Mueller" <robm@xxxxxxxxxxx> · Thu, 15 Jun 2006 17:06:54 +1000

However, it may also depend on the way cyrus-imapd is stopped by the
system. At least on RedHat/Fedora, the function used by the init scripts
send a TERM to the master, and if it doesn't die for some time, it sends
KILL which _could_ result in corrupt ondisk data if I understand it
correctly. Maybe on very large and busy servers, the method used by
RedHat/Fedora is not so good. Maybe the stop function is really important
and should be optimized like those usually used with other slow stopping
daemons like squid.
How exactly do you stop cyrus?

We use a TERM signal to master, and if it doesn't cleanup properly, we then 
use a KILL signal. However, I don't remember ever seeing it use the KILL 
signal, TERM normally seems fine.

Anyway despite that, it still shouldn't corrupt the DB should it. I thought 
the point of a transactional/logging DB like skiplists is that killing 
something accessing it at any time should not corrupt the DB, it should just 
"rollback" to the last transaction point. Maybe skiplists aren't designed to 
be "kill" safe, but they definitely should be!

Anyway, I'm not happy with how we can handle skiplist dbs. There are no
easy recovery tools which can be used to fix things other by doing by
hand. I mean, something which can be automated easily.

Agreed on that one as well.

Rob

----
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html