Hi everybody,
Following Ludwig's and Mark's suggestions on how to perform a database dump in LDIF format from dbscan, I have come up with a strategy. I'm talking about ticket #47567: https://pagure.io/389-ds-base/issue/47567
I'd like to discuss this strategy and get some feedback.
The general idea is:
- We are cursing though id2entry.db printing entries in order.
- Parents should be printed before children.
- Hence if for some entry parenitd > entryid, we have to print its parent first (out of order) and track that we did so.
- In order to do the above, we don't need to move the db cursor. We can just go fetch something from a random place in the db and the cursor will remain in its place, so we can continue from where we left off after we're done with printing a parent.
- We also need to construct and print the DN of each entry using its RDN and the DN of its father.
- Let's use something like a hash map to pair entryids with dns (only for entries that has children), e.g. map[1] = "dc=example,dc=com", map[4] = "ou=People,dc=example,dc=com", etc.
I'll present the algorithm that I came up with in python-like pseudo-code.
First, the following function constructs the entry's DN and updates the hash map if needed. We can know whether an entry is a parent or not, by the presence of the
# assumes that parentid already exists in the map
function display_entry(e, map):
if not e.parentid:
e.dn = e.rdn
else:
e.dn = e.rdn + map[e.parentid]
if isparent(e):
map[e.entryid] = e.dn
print_to_ldif_format(e)
Then, the main loop:
map = new(hashmap)
printed_in_advance = []
for e in entries:
if e.entryid in printed_in_advance:
continue # skip this, we have already displayed it
if e.parentid < e.entryid:
display_entry(e, map)
else:
# we need to display parents before children
list_of_parents = []
p = e
while p.parentid > p.entryid:
p = get_from_db(key = p.parentid)
list_of_parents.append(p) # see note below (*)
for p in reversed(list_of_parents):
display_entry(p, map)
printed_in_advance.append(p.entryid)
* here we store the whole entry in the list (aka its data) and not just
its id, in order to avoid fetching it from the database again
map = new(hashmap)
printed_in_advance = []
for e in entries:
if e.entryid in printed_in_advance:
continue # skip this, we have already displayed it
if e.parentid < e.entryid:
display_entry(e, map)
else:
# we need to display parents before children
list_of_parents = []
p = e
while p.parentid > p.entryid:
p = get_from_db(key = p.parentid)
list_of_parents.append(p) # see note below (*)
for p in reversed(list_of_parents):
display_entry(p, map)
printed_in_advance.append(p.entryid)
* here we store the whole entry in the list (aka its data) and not just
its id, in order to avoid fetching it from the database again
As a map, we can use a B+ tree implementation from libsds.
I would like to know how the above idea sounds to you before going ahead and implementing it.
_______________________________________________ 389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx