On 7/31/2017 5:02 PM, Jonathan Tan wrote:
Besides review changes, this patch set now includes my rewritten
lazy-loading sha1_file patch, so you can now do this (excerpted from one
of the tests):
test_create_repo server
test_commit -C server 1 1.t abcdefgh
HASH=$(git hash-object server/1.t)
test_create_repo client
test_must_fail git -C client cat-file -p "$HASH"
git -C client config core.repositoryformatversion 1
git -C client config extensions.lazyobject \
"\"$TEST_DIRECTORY/t0410/lazy-object\" \"$(pwd)/server/.git\""
git -C client cat-file -p "$HASH"
with fsck still working. Also, there is no need for a list of promised
blobs, and the long-running process protocol is being used.
Changes from v1:
- added last patch that supports lazy loading
- clarified documentation in "introduce lazyobject extension" patch
(following Junio's comments [1])
As listed in the changes above, I have rewritten my lazy-loading
sha1_file patch to no longer use the list of promises. Also, I have
added documentation about the protocol used to (hopefully) the
appropriate places.
Glad to see the removal of the promises. Given the ongoing
conversation, I'm interested to see how you are detecting locally create
objects vs those downloaded from a server.
This is a minimal implementation, hopefully enough of a foundation to be
built upon. In particular, I haven't added the environment variable to
suppress lazy loading, and the lazy loading protocol only supports one
object at a time.
We can add multiple object support to the protocol when we get to the
point that we have code that will utilize it.
Other work
----------
This differs slightly from Ben Peart's patch [2] in that the
lazy-loading functionality is provided through a configured shell
command instead of a hook shell script. I envision commands like "git
clone", in the future, needing to pre-configure lazy loading, and I
think that it will be less surprising to the user if "git clone" wrote a
default configuration instead of a default hook.
This was on my "todo" list to investigate as I've been told it can
enable people to use taskset to set CPU affinity and get some
significant performance wins. I'd be interested to see if it actually
helps here at all.
This also differs from Christian Couder's patch set [3] that implement a
larger-scale object database, in that (i) my patch set does not support
putting objects into external databases, and (ii) my patch set requires
the lazy loader to make the objects available in the local repo, instead
of allowing the objects to only be stored in the external database.
This is the model we're using today so I'm confident it will meet our
requirements.
[1] https://public-inbox.org/git/xmqqzibpn1zh.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
[2] https://public-inbox.org/git/20170714132651.170708-2-benpeart@xxxxxxxxxxxxx/
[3] https://public-inbox.org/git/20170620075523.26961-1-chriscool@xxxxxxxxxxxxx/
Jonathan Tan (5):
environment, fsck: introduce lazyobject extension
fsck: support refs pointing to lazy objects
fsck: support referenced lazy objects
fsck: support lazy objects as CLI argument
sha1_file: support loading lazy objects
Documentation/Makefile | 1 +
Documentation/gitattributes.txt | 54 ++--------
Documentation/gitrepository-layout.txt | 3 +
.../technical/long-running-process-protocol.txt | 50 +++++++++
Documentation/technical/repository-version.txt | 23 +++++
Makefile | 1 +
builtin/cat-file.c | 2 +
builtin/fsck.c | 25 ++++-
cache.h | 4 +
environment.c | 1 +
lazy-object.c | 80 +++++++++++++++
lazy-object.h | 12 +++
object.c | 7 ++
object.h | 13 +++
setup.c | 7 +-
sha1_file.c | 44 +++++---
t/t0410-lazy-object.sh | 113 +++++++++++++++++++++
t/t0410/lazy-object | 102 +++++++++++++++++++
18 files changed, 478 insertions(+), 64 deletions(-)
create mode 100644 Documentation/technical/long-running-process-protocol.txt
create mode 100644 lazy-object.c
create mode 100644 lazy-object.h
create mode 100755 t/t0410-lazy-object.sh
create mode 100755 t/t0410/lazy-object