On Mon, Mar 13, 2017 at 02:33:09PM -0700, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > > >> > + if (strbuf_readlink(&link, ce->name, ce_namelen(ce)) == 0) { > >> > + strbuf_add(&path, state->base_dir, state->base_dir_len); > >> > + strbuf_add(&path, ce->name, ce_namelen(ce)); > >> > + > >> > + write_file_buf(path.buf, link.buf, link.len); > >> > >> This does "write content into symlink stand-in file", but why? > > > > From the commit message: > > > > > Detect the null object ID for symlinks in dir-diff so that > > > difftool can prepare temporary files that matches how git > > > handles symlinks. > > Yes I read what the proposed log message said, and noticed that > symbolic link is _always_ written as a regular file. > > I was questioning why. I know Git falls back to such behaviour when > the filesystem does not support symbolic links. "Why do so > unconditionally, even when the filesystem does?" was the question. > > > The obvious connection: when core.symlinks = false, Git already falls back > > to writing plain files with the link target as contents. This function > > does the same, for the same motivation: it is the best we can do in this > > case. > > And that "obvious connection" does not answer the question. Thanks for the thorough review. I'll try to answer questions from the various emails in just this one spot in case it helps. Dscho wrote: > Given that we explicitly ask use_wt_file() whether we can use the > worktree's file, and we get the answer "no" before we enter the modified > code block, I would really expect us *not* to want to read the link from > disk at all. That probably deserves a comment on its own. The use_wt_file() function really answers the question -- "does the worktree contain content that does is unknown to Git, and thus we should symlink to the worktree?" Since these are symlinks, and symlinks are already used to point back to the worktree (so that difftools will write back to the worktree in case the user edited in-tool) then the meaning of use_wt_file() in this context can be misleading. What we're trying to do is handle the case where Git knows it's dealing with an entry that it wants to checkout into a temporary area but it has no way to do so. These entries always have the 0{40} null SHA1 because that is what git diff-files reports for content that exists in the worktree only. Dscho wrote: > I think you are right, we cannot simply call strbuf_readlink(), we would > have to check the core_symlinks variable to maybe read the file contents > instead. > > But then, it may not be appropriate to read the worktree to begin > with... > see my reply to the patch that I will send out in a couple of minutes. In this case, where the null SHA1 indicates that it is unknown content, then I believe we must read from the worktree to simulate what Git would have checked out. Checking for core.symlinks should probably be done before calling strbuf_readlink, though. Junio wrote: > > > Detect the null object ID for symlinks in dir-diff so that > > > difftool can prepare temporary files that matches how git > > > handles symlinks. > > Yes I read what the proposed log message said, and noticed that > symbolic link is _always_ written as a regular file. > > I was questioning why. I know Git falls back to such behaviour when > the filesystem does not support symbolic links. "Why do so > unconditionally, even when the filesystem does?" was the question. I have to re-read the code to see where this is special-cased, but it seems that symlinks are always written as raw files by the dir-diff code for the purposes of full-tree diffing. I think the "why" is tied up in the implementation details of the symlink-back-to-the-worktree-to-allow-editing feature. By special-casing in-tree symlinks and writing them as raw files we can hijack on-disk symlinks. We use on-disk symlinks to link back to the worktree so that external diff tools can write to the worktree through the symlink. Junio wrote: > On this part I didn't comment in my previous message, but what is > the implication of omitting add-left-or-right and not registering > this symbolic link modified in the working tree to the symlinks2 > table? > > I am wondering if these should be more like > > if (S_ISLNK(lmode) { > char *content = get_symlink(src_path, &loid); > add_left_or_right(&symlinks2, src_path, content, 0); > free(content); > } > > with get_symlink() helper that does > > - if the object name is not 0{40}, read from the object store > > - if the object name is 0{40}, that means we need to read the real > contents from the working tree file, so do the "readlink(2) if > symbolic link is supported, otherwise open/read/close the stub > file sitting there" thing. > > Similary to the right hand side tree. I'll take a look at trying this out. Reading the code again, the point of add_left_or_right() is to populate the worktree (done later in the loop) with the stuff we read from Git. Thus, if we changed just this section to call get_symlink() then we should not even try to checkout any symlink entries in !use_wt_file(...) block where checkout_entry() / create_symlink_file() are called. Since the symlinks2 hashmap already populates the worktree then that code should instead simply skip symlinks. Later, once we get to the part where we copy stuff back into the worktree, it should be noted that we always skip over symlinks. We simply do not handle them, never have, and I don't think we really should. The use case that we're trying to handle is a common use case where the user is using dir-diff and uses the difftool to edit a file with content that exists in the worktree only. For that use case, a symlink to the worktree is created in the temp area, and Git does not need to do anything special. I do not think the use case of a user editing a symlink stand-in file, and then having Git update the worktree with the updated content, is common or something we want to support. I'd prefer to keep the use case simple since the code is already complicated enough. I'll take a stab at adding a get_symlink() helper, adjust the code so that add_left_or_right() is populated, and special-case the checkout_entry() code path to simply skip over null SHA1s. I'll also address the review notes and try to add more comments to describe what exactly this code does and why it does it. Do the tests make sense? One minor thing I noticed is that I had to use "echo -n" for the stuff coming out of strbuf_readlink(), and plain "echo" for entries that come in via read_sha1_file() content passed to add_left_or_right(). That suggests that maybe I should append a newline to the output from strbuf_readlink() so that it matches Git. Does that sound right? Does Git store symlink entries with a terminating newline? Please let me know if I'm missing something. cheers, -- David