Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX

Derrick Stolee <stolee@xxxxxxxxx> · Tue, 31 Aug 2021 13:17:33 -0400

On 8/31/2021 12:43 PM, Taylor Blau wrote:
> On Tue, Aug 31, 2021 at 09:33:38AM -0700, Junio C Hamano wrote:
>> I do not see the reasoning behind "should not be a blocker" from
>> Derrick substantiated.  What's the reason why that raw object store
>> cannot come from an existing repository, and what's the benefit we
>> get from not having to have a repository there?
> 
> I also didn't find the reasoning spelled out in his response, but I have
> definitely had off-list discussions with Stolee where it was important to
> be able to pass a value to `--object-dir` which does *not* belong to a
> Git repository (but is used as a dumping ground for packs, a MIDX, and
> loose objects).
> 
> It may be worthwhile to recapitulate that discussion here on the list.
> (I'm hoping that Stolee won't mind filling in the details, since I seem
> to have forgotten most of them).

The way we have been using alternates in VFS for Git and Scalar is as a
"shared object cache" that is shared across multiple full Git repositories
with their own working trees. The shared object cache is located in a
location that can be found during "scalar clone" such as

	~/.scalarCache/url_<hash-of-URL>/

This directory contains the same data as a .git/objects directory would.

Data is added to that cache using hooks during 'git fetch' or other
requests for remote data. This means that the second "scalar clone"
command is much faster than the first, because it already has most of
the commit and tree data required to satisfy the partial clone.

(Note: this feature does not exist in the current Scalar CLI RFC, but
would be contributed later.)

These caches were designed before the multi-pack-index -- in fact,
they were an inspiration for them because now deleting a repo would not
clean up old pack-files. The data would be added as a raw pack-file that
is processed with 'git index-pack' or as loose objects. The --object-dir
option was directly created as a way to target the creation and
maintenance of a multi-pack-index within one of these caches that don't
exist as full repositories. Clearly, there were some gaps in that
implementation and I regret creating those gaps.

If I were to redesign the shared object cache, then I would have created
the cache directories as bare repos and then create the "clone" repo as
a worktree linked to that base. That would allow all objects and refs to
be shared, achieving the same goals and an even better user experience.

I'm advocating for the position to continue allowing this feature to
exist without a necessary on-upgrade conversion of these non-repos to
full repos. Maybe that is the best thing to do in the long-term, but
will take some time to do. Keeping compatibility for now seems like it
won't hurt too much.

Thanks,
-Stolee