Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: > Teach sha1_file to invoke the command configured in > extensions.lazyObject whenever an object is requested and unavailable. > > The usage of the hook can be suppressed through a flag when invoking > has_object_file_with_flags() and other similar functions. > > This is meant as a temporary measure to ensure that all Git commands > work in such a situation. Future patches will update some commands to > either tolerate missing objects (without invoking the command) or be > more efficient in invoking this command. > > In order to determine the code changes in sha1_file.c necessary, I > investigated the following: > (1) functions in sha1_file that take in a hash, without the user > regarding how the object is stored (loose or packed) > (2) functions in sha1_file that operate on packed objects (because I > need to check callers that know about the loose/packed distinction > and operate on both differently, and ensure that they can handle > the concept of objects that are neither loose nor packed) > > (1) is handled by the modification to sha1_object_info_extended(). > > For (2), I looked at for_each_packed_object and at the packed-related > functions that take in a hash. For for_each_packed_object, the callers > either already work or are fixed in this patch: > - reachable - only to find recent objects > - builtin/fsck - already knows about missing objects > - builtin/cat-file - warning message added in this commit > > Callers of the other functions do not need to be changed: > - parse_pack_index > - http - indirectly from http_get_info_packs > - find_pack_entry_one > - this searches a single pack that is provided as an argument; the > caller already knows (through other means) that the sought object > is in a specific pack > - find_sha1_pack > - fast-import - appears to be an optimization to not store a > file if it is already in a pack > - http-walker - to search through a struct alt_base > - http-push - to search through remote packs > - has_sha1_pack > - builtin/fsck - already knows about promised objects > - builtin/count-objects - informational purposes only (check if loose > object is also packed) > - builtin/prune-packed - check if object to be pruned is packed (if > not, don't prune it) > - revision - used to exclude packed objects if requested by user > - diff - just for optimization > > An alternative design that I considered but rejected: > > - Adding a hook whenever a packed object is requested, not on any > object. That is, whenever we attempt to search the packfiles for an > object, if it is missing (from the packfiles and from the loose > object storage), to invoke the hook (which must then store it as a > packfile), open the packfile the hook generated, and report that the > object is found in that new packfile. This reduces the amount of > analysis needed (in that we only need to look at how packed objects > are handled), but requires that the hook generate packfiles (or for > sha1_file to pack whatever loose objects are generated), creating one > packfile for each missing object and potentially very many packfiles > that must be linearly searched. This may be tolerable now for repos > that only have a few missing objects (for example, repos that only > want to exclude large blobs), and might be tolerable in the future if > we have batching support for the most commonly used commands, but is > not tolerable now for repos that exclude a large amount of objects. > > Helped-by: Ben Peart <benpeart@xxxxxxxxxxxxx> > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > --- Even though I said a hugely negative thing about the "missing objects are always OK" butchering of fsck, I do like what this patch does. The interface is reasonably well isolated, and moving of the long-running-process documentation to a standalone file is very sensible.