We are implementing a git UI. One interesting case is the repository cloned with '--filter=tree:0', because it makes it a lot harder to run basic git operations such as file log and blame. Eventually we arrived at a number of problems. We should be able to make patches, at least for (2) and (4), if deemed worthy and the plan is clear enough. Note that optimal patches (as we see it) will involve a protocol change. (1) Is it even considered a realistic use case? ----------------------------------------------- I used Linux repository as an example of reasonably large repo: https://github.com/torvalds/linux.git (951025 commits) I cloned Linux repository with various filters and got these stats: git clone --bare <url> 7'624'042 objects 2.86gb network 3.10gb disk git clone --bare --filter=blob:none <url> 5'484'714 (71.9%) objects 1.01gb (35.3%) network 1.16gb (37.4%) disk git clone --bare --filter=tree:0 <url> 951'693 (12.5%) objects 0.47gb (16.4%) network 0.50gb (16.1%) disk git clone --bare --depth 1 --branch master <url> 74'380 ( 0.9%) objects 0.19gb ( 6.6%) network 0.19gb ( 6.1%) disk My conclusion is that '--filter=tree:0' could be desired because it reasonably saves disk space and network. (2) A command to enrich repo with trees --------------------------------------- Since all filters currently include commit objects, it doesn't seem possible to append the trees alone to a repository that already has commits. It seems that it could be possible to download trees+commits like this: git -c "remote.origin.partialclonefilter=blob:none" fetch --deepen=999999 origin Here, '--deepen' is a dirty hack to convince git to re-download commits that are already present locally (without trees though). Here, '-c' is a workaround for the problem where 'git fetch' overwrites filter in config. This problem is probably solved in cooking topic: 'fetch: do not override partial clone filter'. However, according to figures in (1), re-downloading commits should cost around the cost of 'clone --filter=tree:0', that is 0.5gb extra in case of Linux repo. It would be nice to avoid that by having a filter like "trees only please". It would also be nice to get rid of '--deepen' hack. (3) Properly supporting 'git blame' and 'git log -- path' --------------------------------------------------------- Currently, promisor will download things one at a time, which is very slow. For example, 'git blame' will download trees for commits, processing one commit at a time. See (4) for a possible solution. (4) Command to download ALL trees for a subpath ----------------------------------------------- E.g. for blamed path '/1/2/3/4.txt', only parent trees will be downloaded: '/1' '/1/2' '/1/2/3' Such minimal approach should fall in line with user's intention for using '--filter=tree:0' - user obviously wanted to minimize something, be that disk or network used. It doesn't sound nice if the first 'git blame' reverts to a repo with all trees, as if cloned with '--filter=blob:none'. Currently '--filter=sparse:oid' is there to support that, but it is very hard to use on client side, because it requires paths to be already present in a commit on server. For a possible solution, it sounds reasonable to have such filter: --filter=sparse:pathlist=/1/2' Path list could be delimited with some special character, and paths themselves could be escaped. On top of helping with 'git blame' and 'git log', this feature should help a lot with sparse clones of large mono-repos, such as Google's super-mono-repo.