On Sat, Feb 17, 2024 at 08:38:08PM +0000, charmocc wrote: > I was recently exploring git partial clone feature because I wanted to > contribute to repository which has a lot of binary files. My intent was to only > add new files without modifying any existing ones and to download as few data > as possible in the process. Here are the steps I followed: > > $ git clone --no-checkout --filter=blob:none https://github.com/libretro-thumbnails/Nintendo_-_Nintendo_Entertainment_System.git nes > $ cd nes > $ echo foo > bar > $ git add bar > $ git commit bar # causes git fetch behind the scene and download of a lot of objects! > > Now for reasons I don't understand the last command cause download of a lot of > objects from remote (blobs) which is what I was trying to avoid. By enabling > tracing options I can see that it runs fetch operation in the background: I think what is happening is something like: 1. You clone with --no-checkout, so you do not fetch any of the blobs. But you also have an empty index, with no entries at all. 2. Running "git commit" is going to need all of those entries in the index (to compute the hash of the new tree). So it will read it from the tree of the current HEAD. 3. When we load entries into the index, the usual next thing to do is to check them out. So rather than fetch them one by one as we do the actual checkout, the index-reading code collects all of the entries we don't have and then does a single fetch for them. This is prefetch_cache_entries() in read-cache.c. Now obviously in your example, the "usual" thing is not happening; we do not intend to write those entries into the working tree, so fetching them is pointless. There may be some room for improvement here. E.g., teaching the index-reading code a flag that says "don't bother prefetching", and use it in this call chain. I'm not sure if there would be other gotchas, though. But here are a few alternatives that you can try without making any code changes: a. Your --no-checkout skips the checkout, but it does not tell Git that you are fundamentally uninterested in those other paths. To do that, you can try the sparse-checkout mechanism. I'm not super familiar with the feature myself, but doing: git clone --sparse --filter=blob:none $url nes ends up with an empty checkout to which you can add things (the trick is that we do have all of those index entries, but they are marked as "not interesting"). Do note that --sparse checks out the contents of the top-level tree by default. That's OK for your repo (all of the files are in the Named_Titles directory), but it might not be true for some other repos (it may also not work if your intent is to put another entry into Named_Titles, though it looks like you might just need to say "git add --sparse"). b. Skip the index entirely and just construct your own tree/commit. E.g., doing: blob=$(git hash-object -w some-file) tree=$({ git ls-tree HEAD && printf "100644 blob $blob\t%s" some-file } | git mktree --missing) commit=$(echo my commit message | git commit-tree -p HEAD $tree) git update-ref HEAD $commit It gets a little trickier if your want to add to a sub-directory (you have to recursively generate each tree). In both cases you might also want to clone with "--depth 1", so you do not bother grabbing old commits and trees, either. > git version 2.34.1 (Ubuntu 22.04) The sparse-checkout feature is new-ish and has been actively worked on in the past few years. What I showed above works with the latest release of Git, but you may or may not need to upgrade (I didn't dig into the details). -Peff