This is based on ds/sparse-allow-empty-working-tree. BACKGROUND ========== After discussing the future of sparse-checkout at the Git Contributor Summit, Minh Tai caught me during the break to introduce a very interesting idea: in-tree sparse-checkout definitions. Here is the context: I was talking about how the Office team is building a tool to update the sparse-checkout definition dynamically as their build definitions change. They have a very compartmentalized build system, and certain "projects" depend on each other. To ensure that builds do not break as dependencies change, the plan was to have this tool run as a hook when HEAD changes. This is a bit expensive to do, and we were discussing how much we need this to be automated versus a part of the build process. This kind of idea would need to be replicated for every team that wants to use sparse-checkout in this way. That doesn't bode well for wide adoption of the feature. IN-TREE SPARSE-CHECKOUT DEFINITIONS =================================== Minh's idea was simple: have sparse-checkout files in the working directory and use config to point to them. As these in-tree files update, we can automatically update the sparse-checkout definition accordingly. Now, the only thing to do would be to ensure that the sparse-checkout files are updated when someone updates the build definitions. This requires some extra build validation, but would not require special tools built on every client. As soon as I sat down to implement this idea, I discovered some complications to that basic idea. Each of these sparse-checkout definitions should fit a "role" on the team. To continue with the Office analogy, suppose we had a definition for developers working on each of Word, PowerPoint, and Excel. What happens when someone wants to port a feature from Excel to Word? That user would want to take the union of the two sparse-checkout definitions. But what does that mean when we are working with arbitrary sparse-checkout files? This leads to the restriction that I built into this feature: we only care about "cone mode" definitions. These rely on directory-based prefix matches that are very fast. With this restriction, it is simple to understand what the "union" operation should do: take all directories that would be included by either. Once we restrict to cone mode, there is no reason to continue storing files using the sparse-checkout file format. The list of patterns is larger than the input to "git sparse-checkout set" or output from "git sparse-checkout list" in cone mode, and more prone to user error. Thus, let's be simpler and use a list of directories to specify the sparse-checkout definition. This leads to the second issue that complicates things. If a build dependency changes to a core library, but there are N "roles" that depend on that library, then the person changing that library also needs to change and validate N sparse-checkout definitions! This does not scale well. So, let's take a page from build dependencies and allow these in-tree sparse-checkout definitions to depend on each other! The end result is a mechanism that should be flexible enough to future needs but simple enough to build using existing Git features. The format re-uses the config file format. This provides good re-use of code as well as being something easy to extend in the future. In Patch 4, I tried to describe several alternatives and why this is the best of those options. I'm open to suggestions here. OUTLINE ======= * Patch 1 is a bugfix that should work on its own. I caught it in tests after Patch 5. * Patches 2-7 implement the feature. * Patches 8-9 make the Git build system a bit more robust to missing directories. * Patch 10 adds a .sparse directory to Git as well as two in-tree sparse-checkout definitions. One is for a bare-bones Linux developer and the other adds the compat code for Windows on top. As mentioned in patch 10, this allows the following workflow for Git contributors that want to "eat our own dogfood" with the partial clone and sparse-checkout features: $ git clone --sparse --filter=blob:none https://github.com/git/git$ cd git $ git sparse-checkout set --in-tree .sparse/base.deps Then, we have a smaller working directory but can still build and test the code. FUTURE DIRECTIONS ================= There are definitely ways to make this feature more seamless, or more useful to us. * It would be nice to extend git clone to have a string value to --sparse=<file> and possibly have it multi-valued. This would initialize the in-tree sparse-checkout definition immediately upon cloning. * I think there are some ways we could reorganize the Git codebase to make our "most basic" sparse-checkout file have even fewer files. Mostly, it would require modifying the build to ask "do these files exist on-disk? if not, then disable this build option." This would also require adding test dependencies that also disable some tests when those do not exist. * Currently, if we have an in-tree sparse-checkout definition, then we can't add more directories to it manually. If someone runs git sparse-checkout set --in-tree <file> followed by git sparse-checkout add <dir>, then will disappear as it is overridden by the in-tree definition. The only way I can think to get around this would be to add these directories through Git config, since we were using the sparse-checkout file as the storage of "what directories do I care about?" but it is no longer the source of truth in this situation. I'm open to ideas here, and it would be nice to have this interaction hardened before moving the series out of RFC status. Thanks, -Stolee Derrick Stolee (10): unpack-trees: avoid array out-of-bounds error sparse-checkout: move code from builtin sparse-checkout: move code from unpack-trees.c sparse-checkout: allow in-tree definitions sparse-checkout: automatically update in-tree definition sparse-checkout: use oidset to prevent repeat blobs sparse-checkout: define in-tree dependencies Makefile: skip git-gui if dir is missing Makefile: disable GETTEXT when 'po' is missing .sparse: add in-tree sparse-checkout for Git .sparse/base.deps | 19 ++ .sparse/windows.deps | 3 + Documentation/config.txt | 2 + Documentation/config/sparse.txt | 15 + Documentation/git-sparse-checkout.txt | 70 ++++- Makefile | 19 +- builtin/commit.c | 4 +- builtin/read-tree.c | 4 + builtin/sparse-checkout.c | 268 +++-------------- read-cache.c | 8 +- sparse-checkout.c | 418 ++++++++++++++++++++++++++ sparse-checkout.h | 30 ++ t/lib-gettext.sh | 1 - t/t0200-gettext-basic.sh | 6 +- t/t1091-sparse-checkout-builtin.sh | 143 +++++++++ t/test-lib.sh | 7 +- unpack-trees.c | 10 +- 17 files changed, 785 insertions(+), 242 deletions(-) create mode 100644 .sparse/base.deps create mode 100644 .sparse/windows.deps create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h base-commit: ace224ac5fb120e9cae894e31713ab60e91f141f Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-627%2Fderrickstolee%2Fsparse-checkout-in-tree-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-627/derrickstolee/sparse-checkout-in-tree-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/627 -- gitgitgadget