Re: [PATCH v3 19/20] refs: add LMDB refs backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[I don't know too much about the refs code, but was still interested
in this patch series, so below are the few things that I noticed while
reading through it.]

On 01/14, David Turner wrote:
> Add a database backend for refs using LMDB.  This backend runs git
> for-each-ref about 30% faster than the files backend with fully-packed
> refs on a repo with ~120k refs.  It's also about 4x faster than using
> fully-unpacked refs.  In addition, and perhaps more importantly, it
> avoids case-conflict issues on OS X.
>
> LMDB has a few features that make it suitable for usage in git:
>
> 1. It is relatively lightweight; it requires only one header file, and
> the library code takes under 64k at runtime.
>
> 2. It is well-tested: it's been used in OpenLDAP for years.
>
> 3. It's very fast.  LMDB's benchmarks show that it is among
> the fastest key-value stores.
>
> 4. It has a relatively simple concurrency story; readers don't
> block writers and writers don't block readers.
>
> Ronnie Sahlberg's original version of this patchset used tdb.  The
> major disadvantage of tdb is that tdb is hard to build on OS X.  It's
> also not in homebrew.  So lmdb seemed simpler.
>
> To test this backend's correctness, I hacked test-lib.sh and
> test-lib-functions.sh to run all tests under the refs backend. Dozens
> of tests use manual ref/reflog reading/writing, or create submodules
> without passing --ref-storage to git init.  If those tests are
> changed to use the update-ref machinery or test-refs-lmdb-backend (or,
> in the case of packed-refs, corrupt refs, and dumb fetch tests, are
> skipped), the only remaining failing tests are the git-new-workdir
> tests and the gitweb tests.
>
> Signed-off-by: David Turner <dturner@xxxxxxxxxxxxxxxx>
> ---
>  .gitignore                                     |    1 +
>  Documentation/config.txt                       |    7 +
>  Documentation/git-clone.txt                    |    3 +-
>  Documentation/git-init.txt                     |    2 +-
>  Documentation/technical/refs-lmdb-backend.txt  |   52 +
>  Documentation/technical/repository-version.txt |    5 +
>  Makefile                                       |   12 +
>  config.c                                       |   29 +
>  configure.ac                                   |   33 +
>  contrib/workdir/git-new-workdir                |    3 +
>  refs.h                                         |    2 +
>  refs/lmdb-backend.c                            | 2051 ++++++++++++++++++++++++
>  setup.c                                        |   11 +-
>  test-refs-lmdb-backend.c                       |   64 +
>  transport.c                                    |    7 +-
>  15 files changed, 2275 insertions(+), 7 deletions(-)
>  create mode 100644 Documentation/technical/refs-lmdb-backend.txt
>  create mode 100644 refs/lmdb-backend.c
>  create mode 100644 test-refs-lmdb-backend.c
>

[...]

> diff --git a/Documentation/technical/refs-lmdb-backend.txt b/Documentation/technical/refs-lmdb-backend.txt
> new file mode 100644
> index 0000000..c328bfc
> --- /dev/null
> +++ b/Documentation/technical/refs-lmdb-backend.txt
> @@ -0,0 +1,52 @@
> +Notes on the LMDB refs backend
> +==============================
> +
> +Design:
> +------
> +
> +Refs and reflogs are stored in a lmdb database in .git/refdb.  All
> +keys and values are \0-terminated.
> +
> +Keys for refs are the name of the ref (e.g. refs/heads/master).
> +Values are the value of the ref, in hex
> +(e.g. 61f23eb0f81357c19fa91e2b8c6f3906c3a8f9b0).
> +
> +All per-worktree refs (refs/bisect/* and HEAD) are stored using
> +the traditional files-based backend.
> +
> +Reflogs are stored as a series of database entries.
> +
> +For non-empty reflogs, there is one entry per logged ref update.  The
> +key format is logs/[refname]\0[timestamp].  The timestamp is a 64-bit
> +unsigned integer number of nanoseconds since 1/1/1970, stored in
> +network byte order.  This means that reflog entries are
> +chronologically ordered.  Because LMDB is a btree database, we can
> +efficiently iterate over these keys.
> +
> +For an empty reflog, there is a "header" entry to show that a reflog
> +exists.  The header has the same format as an ordinary reflog, but with
> +a timeztamp of all zeros and an empty value.

s/timeztamp/timestamp/

> +
> +Reflog values are in the same format as the original files-based
> +reflog, including the trailing LF. The date in the reflog value
> +matches the date in the timestamp field.
> +
> +Weaknesses:
> +-----------
> +
> +The reflog format is somewhat inefficient: a binary format could store
> +reflog date/time information in somewhat less space.
> +
> +The rsync and file:// transports don't work yet, because they
> +don't use the refs API.
> +
> +git new-workdir is incompatible with the lmdb backend.  Fortunately,
> +git new-workdir is deprecated, and worktrees work fine.
> +
> +LMDB locks the entire database for write.  Any other writer waits
> +until the first writer is done before beginning.  Readers do not wait
> +for writers, and writers do not wait for readers.  The underlying
> +scheme is approximately MVCC; each reader's queries see the state of
> +the database as-of the time that the reader acquired its read lock.
> +This is not too far off from the files backend, which loads all refs
> +into memory when one is requested.

[...]

> diff --git a/contrib/workdir/git-new-workdir b/contrib/workdir/git-new-workdir
> index 888c34a..c035a43 100755
> --- a/contrib/workdir/git-new-workdir
> +++ b/contrib/workdir/git-new-workdir
> @@ -28,6 +28,9 @@ git_dir=$(cd "$orig_git" 2>/dev/null &&
>    git rev-parse --git-dir 2>/dev/null) ||
>    die "Not a git repository: \"$orig_git\""
>
> +
> +test "$(git config extensions.refstorage)" = "lmdb" && die "git-new-workdir is incompatible with the refs lmdb storage"
> +

Is it expected that other potential ref backends are compatible with
git-new-workdir?  Otherwise I think it would make more sense to
whitelist the files backend here instead of blacklisting the lmdb
backend, so we don't risk forgetting about this when adding another
backend.

>  case "$git_dir" in
>  .git)
>  	git_dir="$orig_git/.git"

[...]

> +static int lmdb_init_db(struct strbuf *err, int shared)
> +{
> +	/*
> +	 * To create a db, all we need to do is make a directory for
> +	 * it to live in; lmdb will do the rest.
> +	 */
> +
> +	if (!db_path)
> +		db_path = xstrdup(real_path(get_refdb_path(get_git_common_dir())));

I think we're leaking some memory from get_refdb_path() here.
get_refdb_path() uses strbuf_detach(), which according to its docstring
makes its caller take care of the memory of the returned string.
real_path() then uses strbuf_addstr() to add the string to its
internal strbuf, but leaves the string we get from get_refdb_path()
alone, so it leaks.

> +
> +	if (mkdir(db_path, 0775) && errno != EEXIST) {
> +		strbuf_addf(err, "%s", strerror(errno));
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +

[...]

> +/*
> + * Begin a transaction. Because only one transaction per thread is
> + * permitted, we use a global transaction object.  If a read-write
> + * transaction is presently already in-progress, and a read-only
> + * transaction is requested, the read-write transaction will be
> + * returned instead.  If a read-write transaction is requested and a
> + * read-only transaction is open, the read-only transaction will be
> + * closed.
> + *
> + * It is a bug to request a read-write transaction during another
> + * read-write transaction.
> + *
> + * As a result, it is unsafe to retain read-only transactions past the
> + * point where a read-write transaction might be needed.  For
> + * instance, any call that has callbacks outside this module must
> + * conclude all of its reads from the database before calling those
> + * callbacks, or must reacquire the transaction after its callbacks
> + * are completed.
> + */
> +int lmdb_transaction_begin_flags(struct strbuf *err, unsigned int flags)
> +{
> +	int ret;
> +	MDB_txn *txn;
> +	static size_t last_txnid = 0;
> +	int force_restart = 0;
> +	MDB_envinfo stat;
> +
> +	if (!db_path)
> +		db_path = xstrdup(real_path(get_refdb_path(get_git_common_dir())));

Same comment about leaking memory from get_refdb_path() as above applies.

> +	init_env(&env, db_path);
> +
> +	/*
> +	 * Since each transaction sees a consistent view of the db,
> +	 * downstream processes that write the db won't be seen in
> +	 * this transaction.  We can check if the last transaction id
> +	 * has changed since this read transaction was started, and if
> +	 * so, we want to reopen the transaction.
> +	 */
> +
> +	mdb_env_info(env, &stat);
> +	if (stat.me_last_txnid != last_txnid) {
> +		force_restart = 1;
> +		last_txnid = stat.me_last_txnid;
> +	}
> +
> +	if (!transaction.txn) {
> +		ret = mdb_txn_begin(env, NULL, flags, &txn);
> +		if (ret) {
> +			strbuf_addf(err, "mdb_txn_begin failed: %s",
> +				    mdb_strerror(ret));
> +			return -1;
> +		}
> +		ret = mdb_dbi_open(txn, NULL, 0, &transaction.dbi);
> +		if (ret) {
> +			strbuf_addf(err, "mdb_txn_open failed: %s",
> +				    mdb_strerror(ret));
> +			return -1;
> +		}
> +		transaction.txn = txn;
> +		transaction.flags = flags;
> +		return 0;
> +	}
> +
> +	if (transaction.flags == flags && !(flags & MDB_RDONLY))
> +		die("BUG: rw transaction started during another rw txn");
> +
> +	if (force_restart || (transaction.flags != flags && transaction.flags & MDB_RDONLY)) {
> +		/*
> +		 * RO -> RW, or forced restart due to possible changes
> +		 * from downstream processes.
> +		 */
> +		mdb_txn_abort(transaction.txn);
> +		ret = mdb_txn_begin(env, NULL, flags, &txn);
> +		if (ret) {
> +			strbuf_addf(err, "restarting txn: mdb_txn_begin failed: %s",
> +				    mdb_strerror(ret));
> +			return -1;
> +		}
> +		ret = mdb_dbi_open(txn, NULL, 0, &transaction.dbi);
> +		if (ret) {
> +			strbuf_addf(err, "mdb_txn_open failed: %s",
> +				    mdb_strerror(ret));
> +			return -1;
> +		}
> +		transaction.txn = txn;
> +		transaction.flags = flags;
> +	}
> +	/* RW -> RO just keeps the RW txn */
> +	return 0;
> +}

[...]
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]