Re: [PATCH v3 01/20] sparse-index: design doc and format update

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 23 Mar 2021 13:10:18 -0700

Derrick Stolee <stolee@xxxxxxxxx> writes:

>>> +Three important scale dimensions for a Git worktree are:
>> 
>> s/worktree/working tree/; The former is the thing the "git worktree"
>> command deals with.  The latter is relevant even when "git worktree"
>> is not used (the traditional "git clone and you get a working tree
>> to work in").
>
> I guess I'm distracted by using SKIP_WORKTREE a lot, but "working
> directory" is more specific and hence better.

Since the user's current working directory can be outside any
working tree that is governed by any git repository, "working
directory" is a term I try to avoid when describing the directory
where a checkout of a revision lives.

Documentation/glossary-content.txt is where the suggestion for
"working tree" comes from.

> I could rearrange things here. The important things to note are:
>
> 1. Updating index entries is very fast, but adds up at large scale.

This is the "checkout to match the index to the tree of HEAD" part,
ignoring the cost of writing working tree files out?

> 2. It is faster to write a file to disk from Git's object database
>    than it is to compare a file on disk to the copy in the database,
>    which is frequently necessary when the mtime on disk doesn't match
>    the mtime in the index.

True.  But of course, not having to do either (i.e. having a fresh
cached stat info) would be even faster ;-).

>> Also it
>> is unclear what you mean by "changing HEAD only require updating the
>> index".  Certainly when "git switch" flips HEAD from one commit to
>> another, you'd update the index and update the files in the working
>> tree (in the Populated part that is in the sparse-checkout cone) to
>> match, no?
>
> This is unclear of me. I was thinking more on the lines of "git reset"
> (soft mode) which updates HEAD without changing the files on disk.

OK, and that is in line with your "updating index entries is very
fast (but adds up)".

> After all of this postulating, I think that the offending sentences
> are better off deleted. They don't add clarity over what can be
> inferred by an interested reader.

OK.

> I'm mixing terms incorrectly. I think what I really mean is
>
>   In fact, these loops expect to see a reference to every
>   staged file.

OK.

>  The plan is to make all of these integrations "sparse aware" so
>  this expansion through tree parsing is unnecessary and they use
>  fewer resources than when using a full index.

;-)

> I meant by "serialized index file" is that the file written to disk has
> the sparse directory entries, but the in-core copy will not (except for
> a very brief moment in time, during do_read_index()).

Nice.  That would probably mean cache-tree extension on-disk can go
away, because we can populate in-core cache-tree from these entries.
I've always hated the on-disk encoding of that extension.

Or we are not doing this "extra tree" everywhere (i.e. limited only
to the parts that are marked for "sparse checkout")?

Thanks.