Re: [389-devel] RFC: New Design: Fine Grained ID List Size

Rich Megginson <rmeggins@xxxxxxxxxx> · Fri, 13 Sep 2013 15:41:16 -0600

On 09/13/2013 02:39 PM, David Boreham wrote:
On 9/13/2013 2:18 PM, Rich Megginson wrote:
On 09/12/2013 07:08 PM, David Boreham wrote:
On 9/11/2013 11:41 AM, Howard Chu wrote:

Just out of curiosity, why is keeping a count per key a problem? If 
you're using BDB duplicate key support, can't you just use 
cursor->c_count() to get this? I.e., BDB already maintains key 
counts internally, why not leverage that?

afaik you need to pass the DB_RECNUM flag at DB creation time to get 
record counting behavior, and it imposes a performance and 
concurrency penalty on writes. Also afaik 389DS does not set that 
flag except on VLV indexes (which need it, and coincidentally were 
the original reason for the feature being added to BDB).

I'm using bdb 4.7 on RHEL 6.
Looking at the code, it appears the dbc->count method for btree is 
__bamc_count() in bt_cursor.c.  I'm not sure, but it looks as though 
this function has to iterate each page counting the duplicates on 
each page, which makes it a non-starter. Unless I'm mistaken, it 
doesn't look as though it keeps a counter on each update, then simply 
returns the counter.  I don't see any code which would make the 
behavior different depending on if DB_RECNUM is used when the 
database is created.

The DB_RECNUM count feature is not accessed via dbc->count() but 
through the dbc->c_get() call, passing DB_GET_RECNO, positioning at 
the last key. You do also need to use nested btrees for it to count 
the dups, afaik (but we're doing that in the DS indexes already I 
believe).

I wrote a small bdbtest.py script which uses the python bdb interface.
https://github.com/richm/scripts/blob/master/bdbtest.py

This creates an env, opens a db with 
bsddb.db.DB_DUPSORT|bsddb.db.DB_RECNUM, adds several non-dup and dup 
records, opens a cursor and iterates them.  This is the output:

open dbenv in /var/tmp/dbtest
open db /var/tmp/dbtest/dbtest.db4
no txn records
    key=key0 val=data0
    extra=('', '\x01\x00\x00\x00')
<snip>
    key=key9 val=data9
    extra=('', '\n\x00\x00\x00')
    key=multikey val=multidata0
    extra=('', '\x0b\x00\x00\x00')
<snip>
    key=multikey val=multidata9
    extra=('', '\x0b\x00\x00\x00')

The extra is the str() output of cur.get(bsddb.db.DB_GET_RECNO)

So for all of the dup records, the recno is the same '\b' == 11?

I'm probably missing something, but how do I use this to get the number 
of duplicates?

--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel

--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel