On 04/12/2017 06:58 PM, William Brown
wrote:
And in reality its 20 million entries (master + replica)On Wed, 2017-04-12 at 17:02 -0400, Mark Reynolds wrote:Hello, This is a beta version of a replication diff tool written in python. Design page (this needs updating - I hope to get that done tonight) http://www.port389.org/docs/389ds/design/repl-diff-tool-design.html Current usage: -v, --verbose Verbose output -o FILE, --outfile=FILE The output file -D BINDDN, --binddn=BINDDN The Bind DN (REQUIRED) -w BINDPW, --bindpw=BINDPW The Bind password (REQUIRED) -h MHOST, --master_host=MHOST The Master host (default localhost) -p MPORT, --master_port=MPORT The Master port (default 389) -H RHOST, --replica_host=RHOST The Replica host (REQUIRED) -P RPORT, --replica_port=RPORT The Replica port (REQUIRED) -b SUFFIX, --basedn=SUFFIX Replicated suffix (REQUIRED) -l LAG, --lagtime=LAG The amount of time to ignore inconsistencies (default 300 seconds) -Z CERTDIR, --certdir=CERTDIR The certificate database directory for startTLS connections -i IGNORE, --ignore=IGNORE Comma separated list of attributes to ignore -M MLDIF, --mldif=MLDIF Master LDIF file (offline mode) -R RLDIF, --rldif=RLDIF Replica LDIF file (offline mode) |Examples: python repl-diff.py -D "cn=directory manager" -w PASSWORD -h localhost -p 389 -H remotehost -P 5555 -b "dc=example,dc=com" ||python repl-diff.py -D "cn=directory manager" -w PASSWORD -h localhost -p 389 -H remotehost -P 5555 -b "dc=example,dc=com" -Z /etc/dirsrv/slapd-localhost| |python repl-diff.py -M /tmp/master.ldif -R /tmp/replica.ldif | How long the tool takes to run depends on the number of entries per database. See performance numbers below Entries per Replica Time --------------------------------- 100k 40 seconds 500k 3m 30secs 1 million 7m 30secs 2 million 14 minutes 10 million ~70 minutes I'd be very interested in feedback, RFE's, and bugs.Hey mate, The tool looks great, awesome work on this. Really impressive that you got it to 70 minutes for 10 million entries. Not really :) We don't have to rely on server side sorting, and it's just a paged result search - so it breaks up the load (slightly). But, it's still expensive because it is returning all the entries, but I didn't see any extreme CPU usage.How responsive is the server during this process? We aren't going to cause some odd resource exhaustion? Easy, no problem.import optparse With python, optparse is deprecated. Can we use argparse instead? It's nearly identical. Lots of examples of this in dsctl. Good ideaWith connect to replicas, some sites may only have ldaps (provided by a load balancer). So our scripts should really be taking an LDAPurl, a certdir, and a starttls flag. Because ldaps://localhost + certdir is a valid option, but if we force call start_tls_s(), we break it. I think it should always be a standalone tool, but we can tie it in with lib389 (move the main guts out of the tool and into lib389)As well, someone may use ldapi:// etc. It also saves on port options and more flags to the cli because we can do ldap://localhost:30389 etc. Hope that helps, I'll be happy to review again later! For now, I think our strategy with this should be to add it to 389-ds-base, and later we can move this into lib389 when we can. How does that sound?
|
_______________________________________________ 389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx