Re: Strategy proposal for making DB dump in LDIF format from dbscan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2017-08-22 at 09:03 +0200, Ludwig Krispenz wrote:
> On 08/22/2017 01:31 AM, William Brown wrote:
> >>> I have a question / concern though. I thought that we want dbscan 2
> >>> ldif for emergency recovery scenarios when all else has gone bad and
> >>> assuming that id2entry is still readable. In the approach you
> >>> described we make the assumption that the parentid index is readable
> >>> as well. So we depend on two files instead of one for exporting the
> >>> database. Does this matter or we don't care at all?
> >> There are two scenarios here in my opinion.  Backup, and emergency
> >> backup :-)  As I've previously stated: performance is important.  It
> >> should not take forever to process a 100 million entry database.  I
> >> think the tool should use multiple index files (id2entry + friends) if
> >> we can generate the LDIF faster.  But, if some of those indexes are
> >> corrupted, then we need an alternate algorithm to generate it just from
> >> id2entry.  Also, if we are dealing with a corrupted db, then performance
> >> is not important, recovery is.  So if we can do it fast, do it,
> >> otherwise grind it out.
> >>
> >> All that being said there is something we need to consider, which I
> >> don't have an answer for, and that is when databases do get corrupted
> >> which files typically get corrupted?  Is it indexes, or is it id2entry?
> >> To be honest database corruption doesn't happen very often, but the tool
> >> should be smart enough to realize that the data could be inaccurate.
> >> Perhaps a parent could be missing, etc.  So the tool should be robust
> >> enough to use multiple techniques to complete an entry, and if it can't
> >> it should log something, or better yet create a rejects file that an
> >> Admin can take and repair manually.
> >>
> >> I know this is getting more complicated, but we need to keep these
> >> things in mind.
> >>
> >> Regards,
> >> Mark
> > With the current design of id2entry and friends, we can't automatically
> > detect this so easily. I think we should really just have a flag on
> > dbscan that says "ignore everything BUT id2entry" and recover all you
> > can. We should leave this to a human to make that call.
> >
> > If our database had proper checksumming of content and pages, we could
> > detect this, but today that's not the case :(
> well, BDB has db_verify to ensure that a db file is consistent in itself 
> and can be processed, this should be good enough to decide if it is usable.
> But, as Mark mentioned backup, if the backup is an online backup we can 
> not be sure that the id2entry alone is sane backup/restore relies on 
> backup of txn logs and recovery on restore. That said, after a crash we 
> can also have the situation that pages are not flushed from dbcache.
> Generating an ldif from id2entry can be a best effort only and might 
> fail in situations wher it is most needed.
> 
> About the general strategy, maybe we can use another approach (yes, 
> always new ideas).
> In generating the ldif we have to solve the problem that when cursoring 
> thru id2entry child entries can come before parent entries. And we do it 
> in different places: total replication init, db2ldif and now in a utility.
> Wouldn't it be better to make ldif import smarter, and stack entries 
> without parents until the parent is read ? This would simplify the 
> export,init and be solved in one place.


It sounds like we could add a mechanism to id2entry to solve this that
has flags to use entryrdn or other related db's to speed up the sort.
This would satisfy this request that we do this "in one place, and one
place well"


Perhaps Ilias should investigate the other places where we try to sort
by parent to get an idea of what we might need to change? 


-- 
Sincerely,

William Brown
Software Engineer
Red Hat, Australia/Brisbane

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx

[Index of Archives]     [Fedora Directory Announce]     [Fedora Users]     [Older Fedora Users Mail]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Devel Java]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Review]     [Fedora Art]     [Fedora Music]     [Fedora Packaging]     [CentOS]     [Fedora SELinux]     [Big List of Linux Books]     [KDE Users]     [Fedora Art]     [Fedora Docs]

  Powered by Linux