What version of BDB are people using?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm just trying to get an informal survey of which version or Berkeley DB people are using successfully in large cyrus environments. We're currently using:

db4-4.2.52-3.1 - old redhat based machines
libdb4.2.52-18 - newer debian based machines

Both of them seem to be a bit "flakey". We only use BDB for the deliver_db and use:

duplicate_db: berkeley-nosync

For the others we use the recommended skiplist (mailboxes, seen) or flat file (sub).

Basically what we see it that every now and then something goes wrong somewhere inside BDB and causes lots of processes to get caught in "busy wait" loop. Stracing those processes, you see something like this:

select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
...

Just over and over again very quickly (since each sleep is only on the order of 1000th of a second). Once this starts happening, lots of processes start getting caught in this state very quickly and the load on the machine skyrockets. If you run the BDB tool "db_stat" on the environment, you'll see the transaction count quickly increase towards whatever is set as set_tx_max in DB_CONFIG. Once it hits that, BDB goes into an error state, starts filling the cyrus logs with errors, and you have to complete restart cyrus and delete the dbs. It tends to happen between twice a week and once every 2 months per machine, very unpredicatable when it happens, and hard to actually work out what's causing it or what's going on.

Given the way it's calling select() over and over as a "microsleep" mechanism, it seems like it's waiting for some flag to be set in some shared memory that's never being set due to a deadlock or something, thus causing every other process accessing the db to busy wait deadlock as well. Of course, that's just a guess.

So what I'm wondering is:
1. Has anyone else seen this sort of behaviour?
2. What versions of BDB are other people using successfully?
3. What size installation are you using it on (number of mailboxes? messages per minute delivered?) 4. Has anyone had any success using the berkeley-hash-nosync option? I tried that, and it gave me errors about "invalid page 0 type" or something like that pretty quickly

I'm hoping we can build up some consensus of what the most stable version of BDB to use with cyrus is...

Thanks

Rob

----
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

[Index of Archives]     [Cyrus SASL]     [Squirrel Mail]     [Asterisk PBX]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [KDE]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux