Re: [PATCH v2 5/5] sha1_file: support loading lazy objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

> Teach sha1_file to invoke the command configured in
> extensions.lazyObject whenever an object is requested and unavailable.
>
> The usage of the hook can be suppressed through a flag when invoking
> has_object_file_with_flags() and other similar functions.
>
> This is meant as a temporary measure to ensure that all Git commands
> work in such a situation. Future patches will update some commands to
> either tolerate missing objects (without invoking the command) or be
> more efficient in invoking this command.
>
> In order to determine the code changes in sha1_file.c necessary, I
> investigated the following:
>  (1) functions in sha1_file that take in a hash, without the user
>      regarding how the object is stored (loose or packed)
>  (2) functions in sha1_file that operate on packed objects (because I
>      need to check callers that know about the loose/packed distinction
>      and operate on both differently, and ensure that they can handle
>      the concept of objects that are neither loose nor packed)
>
> (1) is handled by the modification to sha1_object_info_extended().
>
> For (2), I looked at for_each_packed_object and at the packed-related
> functions that take in a hash. For for_each_packed_object, the callers
> either already work or are fixed in this patch:
>  - reachable - only to find recent objects
>  - builtin/fsck - already knows about missing objects
>  - builtin/cat-file - warning message added in this commit
>
> Callers of the other functions do not need to be changed:
>  - parse_pack_index
>    - http - indirectly from http_get_info_packs
>  - find_pack_entry_one
>    - this searches a single pack that is provided as an argument; the
>      caller already knows (through other means) that the sought object
>      is in a specific pack
>  - find_sha1_pack
>    - fast-import - appears to be an optimization to not store a
>      file if it is already in a pack
>    - http-walker - to search through a struct alt_base
>    - http-push - to search through remote packs
>  - has_sha1_pack
>    - builtin/fsck - already knows about promised objects
>    - builtin/count-objects - informational purposes only (check if loose
>      object is also packed)
>    - builtin/prune-packed - check if object to be pruned is packed (if
>      not, don't prune it)
>    - revision - used to exclude packed objects if requested by user
>    - diff - just for optimization
>
> An alternative design that I considered but rejected:
>
>  - Adding a hook whenever a packed object is requested, not on any
>    object.  That is, whenever we attempt to search the packfiles for an
>    object, if it is missing (from the packfiles and from the loose
>    object storage), to invoke the hook (which must then store it as a
>    packfile), open the packfile the hook generated, and report that the
>    object is found in that new packfile. This reduces the amount of
>    analysis needed (in that we only need to look at how packed objects
>    are handled), but requires that the hook generate packfiles (or for
>    sha1_file to pack whatever loose objects are generated), creating one
>    packfile for each missing object and potentially very many packfiles
>    that must be linearly searched. This may be tolerable now for repos
>    that only have a few missing objects (for example, repos that only
>    want to exclude large blobs), and might be tolerable in the future if
>    we have batching support for the most commonly used commands, but is
>    not tolerable now for repos that exclude a large amount of objects.
>
> Helped-by: Ben Peart <benpeart@xxxxxxxxxxxxx>
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> ---

Even though I said a hugely negative thing about the "missing
objects are always OK" butchering of fsck, I do like what this patch
does.  The interface is reasonably well isolated, and moving of the
long-running-process documentation to a standalone file is very
sensible.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux