This patch series adds a `--sparse-prefix=` option to multiple commands, allowing fetching repository contents from only a subdirectory of a remote. This works along with sparse-checkout, and is especially useful for repositories where a subdirectory has meaning when standing alone. * Motivation (example use cases) 1. Git repositories used for managing large/binary files My university has a repository containing lecture slides etc. as pdfs, with a subdirectory for each lecture. The bandwith for getting the whole repository (even with --depth=1) is 4GiB with significant processing time, getting the complete history of a single lecture uses 25MiB and completes instantly. 2. package-manager-like repositories. Examples: a) Arch Linux package build files repository [1] b) Rust crates.io packages [2] c) TypeScript type definitions [3] 3. Excluding a specific directory containing e.g. large binary assets Not currently possible with this patch set, but could be added (see problem 2 below). 4. Getting the history of a single file 5. Other uses As a non kernel developer, I wanted to quickly search through the code of only the btrfs filesystem using the git tools, but I do not have a local clone of the complete repository. Using `--depth=100` in combination with `--sparse-prefix=/fs/btrfs` allows me to have little bandwidth usage while still retaining some history. 6. This is trivial in SVN, and searching on the internet, there are multiple questions about this feature [4-7] * Examples usage: Getting the source of the btrfs filesystem with a bit of history: $ git clone git@server:linux --depth=100 # shallow, not sparse Receiving objects: 100% (814945/814945), 438.55 MiB | 35.21 MiB/s, done. ... $ git clone git@server:linux --depth=100 --sparse-prefix=/fs/btrfs # sparse and shallow Receiving objects: 100% (503747/503747), 121.45 MiB | 59.75 MiB/s, done. ... $ cd linux && ls ./ fs $ ls fs/ btrfs $ git log --oneline (repo behaves the same as a full clone with sparse-checkout /fs/btrfs) * Open problems: 1. Currently all trees are still included. It would be possible to include only the trees relevant to the sparse files, which would significantly reduce the pack sizes for repositories containing a lot of small files changing often. For example package managers using git. Not sure in how many places all trees are presumed present. 2. This patch set implements it as a simple single prefix check command line option. Using the exclude_list format (same as in sparse-checkout) might be useful. The server needs to check these patterns for all files in history, so I'm not sure if allowing multiple/complex patterns is a good idea. 3. This patch set assumes the sparse-prefix and sparse-checkout does not change. running clone and fetch both need to have the --sparse-prefix= option, otherwise complete packs will be fetched. Not sure what the best way to store the information is, possibly create a new file `.git/sparse` similar to `.git/shallow` containing the path(s). 3. Bitmap indices cannot be used, because they do not contain the paths of the objects. So for creating packs, the whole DAG has to be walked. 4. Fsck complains about missing blobs. Should be fairly easy to fix. 5. Tests and documentation is missing. [1]: https://git.archlinux.org/svntogit/packages.git/ [2]: https://github.com/rust-lang/crates.io-index [3]: https://github.com/DefinitelyTyped/DefinitelyTyped [4]: https://stackoverflow.com/questions/600079/is-there-any-way-to-clone-a-git-repositorys-sub-directory-only [5]: https://stackoverflow.com/questions/11834386/cloning-only-a-subdirectory-with-git [6]: https://askubuntu.com/questions/460885/how-to-clone-git-repository-only-some-directories [7]: https://coderwall.com/p/o2fasg/how-to-download-a-project-subdirectory-from-github Robin Ruede (7): list-objects: add sparse-prefix option to rev_info pack-objects: add sparse-prefix Skip checking integrity of files ignored by sparse fetch-pack: add sparse prefix to smart protocol fetch: add sparse-prefix option clone: add sparse-prefix option remote-curl: add sparse prefix builtin/clone.c | 27 ++++++++++++++++++++++++--- builtin/fetch-pack.c | 6 ++++++ builtin/fetch.c | 19 ++++++++++++++----- builtin/pack-objects.c | 11 +++++++++++ cache-tree.c | 3 ++- connected.c | 7 ++++++- fetch-pack.c | 4 ++++ fetch-pack.h | 1 + list-objects.c | 4 +++- remote-curl.c | 17 ++++++++++++++++- revision.c | 4 ++++ revision.h | 1 + transport.c | 4 ++++ transport.h | 4 ++++ upload-pack.c | 15 ++++++++++++++- 15 files changed, 114 insertions(+), 13 deletions(-) -- 2.9.1.283.g3ca5b4c.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html