Re: [PATCH] difftool: handle changing symlinks in dir-diff mode

David Aguilar <davvid@xxxxxxxxx> · Mon, 13 Mar 2017 19:20:30 -0700

On Mon, Mar 13, 2017 at 02:33:09PM -0700, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
> 
> >> > +	if (strbuf_readlink(&link, ce->name, ce_namelen(ce)) == 0) {
> >> > +		strbuf_add(&path, state->base_dir, state->base_dir_len);
> >> > +		strbuf_add(&path, ce->name, ce_namelen(ce));
> >> > +
> >> > +		write_file_buf(path.buf, link.buf, link.len);
> >> 
> >> This does "write content into symlink stand-in file", but why?
> >
> > From the commit message:
> >
> > 	> Detect the null object ID for symlinks in dir-diff so that
> > 	> difftool can prepare temporary files that matches how git
> > 	> handles symlinks.
> 
> Yes I read what the proposed log message said, and noticed that
> symbolic link is _always_ written as a regular file.
> 
> I was questioning why.  I know Git falls back to such behaviour when
> the filesystem does not support symbolic links.  "Why do so
> unconditionally, even when the filesystem does?" was the question.
> 
> > The obvious connection: when core.symlinks = false, Git already falls back
> > to writing plain files with the link target as contents. This function
> > does the same, for the same motivation: it is the best we can do in this
> > case.
> 
> And that "obvious connection" does not answer the question.

Thanks for the thorough review.

I'll try to answer questions from the various emails in just this
one spot in case it helps.

Dscho wrote:
> Given that we explicitly ask use_wt_file() whether we can use the
> worktree's file, and we get the answer "no" before we enter the modified
> code block, I would really expect us *not* to want to read the link from
> disk at all.

That probably deserves a comment on its own.

The use_wt_file() function really answers the question --
"does the worktree contain content that does is unknown to
 Git, and thus we should symlink to the worktree?"

Since these are symlinks, and symlinks are already used to
point back to the worktree (so that difftools will write
back to the worktree in case the user edited in-tool)
then the meaning of use_wt_file() in this context
can be misleading.

What we're trying to do is handle the case where Git knows it's dealing
with an entry that it wants to checkout into a temporary area
but it has no way to do so.  These entries always have the
0{40} null SHA1 because that is what git diff-files reports
for content that exists in the worktree only.

Dscho wrote:
> I think you are right, we cannot simply call strbuf_readlink(), we would
> have to check the core_symlinks variable to maybe read the file contents
> instead.
>
> But then, it may not be appropriate to read the worktree to begin
> with...
> see my reply to the patch that I will send out in a couple of minutes.

In this case, where the null SHA1 indicates that it is unknown content,
then I believe we must read from the worktree to simulate what Git
would have checked out.  Checking for core.symlinks should probably be
done before calling strbuf_readlink, though.

Junio wrote:
> > > Detect the null object ID for symlinks in dir-diff so that
> > > difftool can prepare temporary files that matches how git
> > > handles symlinks.
>
> Yes I read what the proposed log message said, and noticed that
> symbolic link is _always_ written as a regular file.
>
> I was questioning why.  I know Git falls back to such behaviour when
> the filesystem does not support symbolic links.  "Why do so
> unconditionally, even when the filesystem does?" was the question.

I have to re-read the code to see where this is special-cased, but
it seems that symlinks are always written as raw files by the
dir-diff code for the purposes of full-tree diffing.

I think the "why" is tied up in the implementation details of
the symlink-back-to-the-worktree-to-allow-editing feature.
By special-casing in-tree symlinks and writing them as raw
files we can hijack on-disk symlinks.  We use on-disk symlinks to
link back to the worktree so that external diff tools can write
to the worktree through the symlink.

Junio wrote:
> On this part I didn't comment in my previous message, but what is
> the implication of omitting add-left-or-right and not registering
> this symbolic link modified in the working tree to the symlinks2
> table?
>
> I am wondering if these should be more like
>
>         if (S_ISLNK(lmode) {
>                 char *content = get_symlink(src_path, &loid);
>                 add_left_or_right(&symlinks2, src_path, content, 0);
>                 free(content);
>         }
>
> with get_symlink() helper that does
>
>  - if the object name is not 0{40}, read from the object store
>
>  - if the object name is 0{40}, that means we need to read the real
>    contents from the working tree file, so do the "readlink(2) if
>    symbolic link is supported, otherwise open/read/close the stub
>    file sitting there" thing.
>
> Similary to the right hand side tree.

I'll take a look at trying this out.

Reading the code again, the point of add_left_or_right()
is to populate the worktree (done later in the loop) with
the stuff we read from Git.  Thus, if we changed just this
section to call get_symlink() then we should not even try
to checkout any symlink entries in !use_wt_file(...)
block where checkout_entry() / create_symlink_file() are called.

Since the symlinks2 hashmap already populates the worktree
then that code should instead simply skip symlinks.

Later, once we get to the part where we copy stuff back
into the worktree, it should be noted that we always
skip over symlinks.  We simply do not handle them,
never have, and I don't think we really should.

The use case that we're trying to handle is a common
use case where the user is using dir-diff and uses the
difftool to edit a file with content that exists in the
worktree only.  For that use case, a symlink to the worktree
is created in the temp area, and Git does not need to do
anything special.

I do not think the use case of a user editing a symlink
stand-in file, and then having Git update the worktree
with the updated content, is common or something we
want to support.  I'd prefer to keep the use case simple
since the code is already complicated enough.

I'll take a stab at adding a get_symlink() helper, adjust
the code so that add_left_or_right() is populated, and
special-case the checkout_entry() code path to simply skip
over null SHA1s.

I'll also address the review notes and try to add more
comments to describe what exactly this code does and why
it does it.

Do the tests make sense?

One minor thing I noticed is that I had to use "echo -n"
for the stuff coming out of strbuf_readlink(), and
plain "echo" for entries that come in via read_sha1_file()
content passed to add_left_or_right().

That suggests that maybe I should append a newline to the
output from strbuf_readlink() so that it matches Git.
Does that sound right?  Does Git store symlink entries
with a terminating newline?

Please let me know if I'm missing something.

cheers,
-- 
David