On 08/22/2017 01:31 AM, William Brown
wrote:
well, BDB has db_verify to ensure that a db file is consistent in itself and can be processed, this should be good enough to decide if it is usable.I have a question / concern though. I thought that we want dbscan 2 ldif for emergency recovery scenarios when all else has gone bad and assuming that id2entry is still readable. In the approach you described we make the assumption that the parentid index is readable as well. So we depend on two files instead of one for exporting the database. Does this matter or we don't care at all?There are two scenarios here in my opinion. Backup, and emergency backup :-) As I've previously stated: performance is important. It should not take forever to process a 100 million entry database. I think the tool should use multiple index files (id2entry + friends) if we can generate the LDIF faster. But, if some of those indexes are corrupted, then we need an alternate algorithm to generate it just from id2entry. Also, if we are dealing with a corrupted db, then performance is not important, recovery is. So if we can do it fast, do it, otherwise grind it out. All that being said there is something we need to consider, which I don't have an answer for, and that is when databases do get corrupted which files typically get corrupted? Is it indexes, or is it id2entry? To be honest database corruption doesn't happen very often, but the tool should be smart enough to realize that the data could be inaccurate. Perhaps a parent could be missing, etc. So the tool should be robust enough to use multiple techniques to complete an entry, and if it can't it should log something, or better yet create a rejects file that an Admin can take and repair manually. I know this is getting more complicated, but we need to keep these things in mind. Regards, MarkWith the current design of id2entry and friends, we can't automatically detect this so easily. I think we should really just have a flag on dbscan that says "ignore everything BUT id2entry" and recover all you can. We should leave this to a human to make that call. If our database had proper checksumming of content and pages, we could detect this, but today that's not the case :( But, as Mark mentioned backup, if the backup is an online backup we can not be sure that the id2entry alone is sane backup/restore relies on backup of txn logs and recovery on restore. That said, after a crash we can also have the situation that pages are not flushed from dbcache. Generating an ldif from id2entry can be a best effort only and might fail in situations wher it is most needed. About the general strategy, maybe we can use another approach (yes, always new ideas). In generating the ldif we have to solve the problem that when cursoring thru id2entry child entries can come before parent entries. And we do it in different places: total replication init, db2ldif and now in a utility. Wouldn't it be better to make ldif import smarter, and stack entries without parents until the parent is read ? This would simplify the export,init and be solved in one place.
-- Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander |
_______________________________________________ 389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx