Re: [PATCH v4 20/21] refs: add LMDB refs storage backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2016-02-12 at 18:01 +0100, Michael Haggerty wrote:
> On 02/05/2016 08:44 PM, David Turner wrote:
> > Add a database backend for refs using LMDB.  This backend runs git
> > for-each-ref about 30% faster than the files backend with fully
> > -packed
> > refs on a repo with ~120k refs.  It's also about 4x faster than
> > using
> > fully-unpacked refs.  In addition, and perhaps more importantly, it
> > avoids case-conflict issues on OS X.
> > 
> > LMDB has a few features that make it suitable for usage in git:
> > 
> 
> 0. It is licensed under the OpenLDAP Public License, which is
> apparently
> GPL compatible [1].
> 
> [1] http://www.gnu.org/licenses/license-list.en.html#newOpenLDAP

Oh, yeah, I checked that before choosing it and then forgot all about
it. 

> > 1. It is relatively lightweight; it requires only one header file,
> > and
> > the library code takes under 64k at runtime.
> > 
> > 2. It is well-tested: it's been used in OpenLDAP for years.
> > 
> > 3. It's very fast.  LMDB's benchmarks show that it is among
> > the fastest key-value stores.
> > 
> > 4. It has a relatively simple concurrency story; readers don't
> > block writers and writers don't block readers.
> > 
> > Ronnie Sahlberg's original version of this patchset used tdb.  The
> > major disadvantage of tdb is that tdb is hard to build on OS X. 
> >  It's
> > also not in homebrew.  So lmdb seemed simpler.
> > 
> > To test this backend's correctness, I hacked test-lib.sh and
> > test-lib-functions.sh to run all tests under the refs backend.
> > Dozens
> > of tests use manual ref/reflog reading/writing, or create
> > submodules
> > without passing --ref-storage to git init.  If those tests are
> > changed to use the update-ref machinery or test-refs-lmdb-backend
> > (or,
> > in the case of packed-refs, corrupt refs, and dumb fetch tests, are
> > skipped), the only remaining failing tests are the git-new-workdir
> > tests and the gitweb tests.
> 
> Would it possible to build your "hack" into the test suite? For
> example,
> perhaps one could implement an option that requests tests to use LMDB
> wherever possible and skip the tests that are not LMDB-compatible.
> Over
> time, we could try to increase the fraction of tests that are
> LMDB-compatible by (for example) using `git update-ref` and `git
> rev-parse` rather than reading/writing reference files via the
> filesystem directly.
> 
> Otherwise, I'm worried that LMDB support will bitrot quickly because
> it
> will be too much of a nuisance to exercise the code.

I can add that.  It requires minor changes to a large number of tests,
but that's OK.

> > Signed-off-by: David Turner <dturner@xxxxxxxxxxxxxxxx>
> > ---
> >  .gitignore                                     |    1 +
> >  Documentation/config.txt                       |    7 +
> >  Documentation/git-clone.txt                    |    3 +-
> >  Documentation/git-init.txt                     |    2 +-
> >  Documentation/technical/refs-lmdb-backend.txt  |   52 +
> >  Documentation/technical/repository-version.txt |    5 +
> >  Makefile                                       |   12 +
> >  configure.ac                                   |   33 +
> >  contrib/workdir/git-new-workdir                |    3 +
> >  refs.c                                         |    3 +
> >  refs.h                                         |    2 +
> >  refs/lmdb-backend.c                            | 2052
> > ++++++++++++++++++++++++
> >  setup.c                                        |   11 +-
> >  test-refs-lmdb-backend.c                       |   64 +
> >  transport.c                                    |    7 +-
> >  15 files changed, 2250 insertions(+), 7 deletions(-)
> >  create mode 100644 Documentation/technical/refs-lmdb-backend.txt
> >  create mode 100644 refs/lmdb-backend.c
> >  create mode 100644 test-refs-lmdb-backend.c
> > 
> > diff --git a/.gitignore b/.gitignore
> > index 1c2f832..87d45a2 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -199,6 +199,7 @@
> >  /test-path-utils
> >  /test-prio-queue
> >  /test-read-cache
> > +/test-refs-lmdb-backend
> >  /test-regex
> >  /test-revision-walking
> >  /test-run-command
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index 06d3659..bc222c9 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -1153,6 +1153,13 @@ difftool.<tool>.cmd::
> >  difftool.prompt::
> >  	Prompt before each invocation of the diff tool.
> >  
> > +extensions.refsStorage::
> > +	Type of refs storage backend. Default is to use the
> > original
> > +	files based ref storage.  When set to "lmdb", refs are
> > stored in
> > +	the lmdb database backend.  This setting reflects the refs
> > +	backend that is currently in use; it is not possible to
> > change
> > +	the backend by changing this setting.
> > +
> 
> Let's give the users a pointer to how they *can* change this setting.
> Maybe
> 
> 	Type of refs storage backend. Default is to use the original
> 	files based ref storage.  When set to "lmdb", refs are stored
> in
> 	an LMDB database.  This setting reflects the refs storage
> 	backend that was chosen via the --ref-storage option when the
> 	repository was originally created. It is currently
> 	not possible to change the refs storage backend of an
> 	existing repository.

OK.

> >  fetch.recurseSubmodules::
> >  	This option can be either set to a boolean value or to 'on
> > -demand'.
> >  	Setting it to a boolean changes the behavior of fetch and
> > pull to
> > diff --git a/Documentation/git-clone.txt b/Documentation/git
> > -clone.txt
> > index 68f56a7..b9c52ce 100644
> > --- a/Documentation/git-clone.txt
> > +++ b/Documentation/git-clone.txt
> > @@ -227,7 +227,8 @@ objects from the source repository into a pack
> > in the cloned repository.
> >  
> >  --ref-storage=<name>::
> >  	Type of ref storage backend. Default is to use the
> > original files
> > -	based ref storage.
> > +	based ref storage. Set to "lmdb" to store refs in the lmdb
> > database
> > +	backend.
> 
> Should the second "lmdb" be upper-case? Maybe also in other places
> where
> the LMDB database is being referred to?

OK, I think I got most of them.

> >  <repository>::
> >  	The (possibly remote) repository to clone from.  See the
> > [...]
> 
> I still haven't reviewed the actual lmdb-backend code. I guess I'm
> going
> to have to make myself an LMDB expert one of these days...

I highly recommend it!  The API is pretty reasonable, and it does seem
to be fast.  The error messages could use some work, but that's my only
complaint.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]