Re: [GSoC][RFC] discussion about stashing with conflicts

Thomas Gummerer <t.gummerer@xxxxxxxxx> · Sun, 7 Apr 2019 19:38:57 +0100

On 04/07, Kapil Jain wrote:
> below is my understanding from reading the parts of code as suggested on IRC.
> 
> what is the use of ce_stage macro ?
> tells about stage of an index entry.
> if ce_stage says, stage #0 i.e staging area, then that index entry is
> in staging area
> and nothing needs to be done.

I don't quite understand what you mean with "nothing needst to be
done" here.  In the context of teaching 'git stash' to handle unmerged
index entries, nothing that is not already being done needs to be done
with an index entry that is at stage #0.  The current implementation
already handles that correctly.

> else a temporary index entry is created and repo_read_index_unmerged()
> calls other function and tries to add it to index.
> if it fails, it issues an error.

Not sure what you mean here.  Index entries with higher stages are not
temporary, they are written out to the index file, and can then be
read back with 'repo_read_index()' for example.

repo_read_index_unmerged() drops entries to stage #0, and marks them
as CE_CONFLICTED, which callers can then use to distinguish index
entries that are conflicted and those that are not.

In the context of the GSoC project, this function is probably not
useful, because we need to separate out entries of different stages,
so we can create commits of those entries, as described in the thread
that's also linked from the project page [*1*].  Once
repo_read_index_unmerged() is called, we no longer know which entry is
at which stage (they are all at stage 0).

*1*: https://public-inbox.org/git/nycvar.QRO.7.76.6.1902072023250.41@xxxxxxxxxxxxxxxxx/

> is this correct interpretation ?
> 
> 1) in repo_read_index_unmerged(), why don't we make the value of
> `unmerged` 0, if adding index entry is successful; as the entry is no
> longer unmerged ?

Because the caller often wants to know if the index is unmerged in the
first place, and would refuse to operate on such an index.  Read the
comment documenting the function again, that explains this very
nicely.  Then see how some callers actually use the function, and
you'll see that they actually don't care about dropping the entry to
stage 0, but they care about knowing whether the index is unmerged or
not.

> 2) what is ADD_CACHE_SKIP_DFCHECK ?
> i am unsure if i get its meaning, cache.h says that it means "Ok to
> skip DF conflict checks"
> what are DF conflict checks ? something about diffing to check for
> conflicts ? if so why are we skipping it this entry had conflicts in
> the past maybe it will create again.

A DF conflict is a directory/file conflict, where one side of the
merge has created a file that has the same name as a directory created
on the other side of the merge.  From a bit of digging around in
git.git, t/t1000-read-tree-m-3way.sh seems to have some good
explanation on the various types of conflicts, not sure if it's
documented somewhere in Documentation/ in detail as well.

In this function we just care about creating an index that doesn't
contain any conflicts, so we are skipping the checks here.

> 3) what is cache_nr variable in index_state struct ? what is its use ?

If you look for cache_nr in the codebase you'll see that it's often
used when iterating through the whole index.  That could give you a
hint what it is.  Alternatively you could go have a look where it is
assigned, and what it is assigned to.  Looking through read-cache.c I
can find this line [*2*].  hdr_entries is in 'struct cache_header',
which is the header of the index file (cache is another name for the
index that's used in the codebase).

Now if you look at Documentation/technical/index-format.txt, you can
see that it's the number of entries that are in the index.

*2*: https://github.com/gitster/git/blob/041f5ea1cf987a4068ef5f39ba0a09be85952064/read-cache.c#L2163

> Now, about add_index_entry_with_check(), i don't fully understand this
> function but concentrating on the part pointed by dscho.
> https://github.com/git/git/blob/v2.21.0/read-cache.c#L1284-L1294
> 
>     /*
>      * Inserting a merged entry ("stage 0") into the index
>      * will always replace all non-merged entries..
>      */
> 
> so this is the part we need to play with for the project
> https://git.github.io/SoC-2019-Ideas/#teach-git-stash-to-handle-unmerged-index-entries
> try and change this in some-way to not replace those unmerged entries
> or store them some place else ?

It would be useful to repeat what the context is in which Dscho gave
you this code to read.  Looking back at the IRC logs, I can see:

    11:55:15 dscho | Also, here is some code that is helpful to understand
    what happens when unmerged entries are collapsed into a merged entry:
    https://github.com/git/git/blob/v2.21.0/read-cache.c#L1284-L1294

And the whole conversation was about trying to understand index
entries better, which is a fundamental part of the project, rather
than code that needs to be changed for the project itself.  Most
likely you will not change any functions in read-cache.c, but you'll
need an understanding of how they do things.

So the question is, did you read this function in depth and understand
what it does?  If you want to validate your understanding of the
function, try to repeat what it does in your own words, and ask for us
to correct you.