Re: Bug: problem with file named with dash character

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> From a cursory look, these definitely go in the right direction. I like
> how the third one was able to rip populate_from_stdin entirely out of
> diff.c.
>
> I suspect we could get by without even the nongit_stdin flag you added;
> the only place which uses it is diff_fill_sha1_info, but theoretically
> we don't even need it there; we could just index_mem the file contents
> we get via diff_populate_filespec, and the stdin contents will
> already be there.

Probably that is a sane thing to do.

And as you hinted, we definitely need to give the "the copy in the
filespec->data is the only one; do not discard until we are really
done with it" semantics to the bit, or introduce a separate bit that
is not necessarily related to the "this came from the standard input"
bit.  I offhand do not know what data source we would later add
that needs such a treatment.  "git diff http://example.com/{foo,bar}";,
perhaps?  The "do not discard" bit at that point may need to learn
to swap it out to a temporary file or something when it happens.

> Right now we call index_path for worktree files, but I don't really see
> much point; we have to read the whole data either way, and
> populate_filespec should be mmap-ing them for us.
>
>>  - We say on the "diff --git" header uglyness like "a/-", "b/-";
>>    likewise in the metainfo;
>
> I'd consider changing the path to "/dev/stdin" for this case. It doesn't
> exist on some platforms, of course, but neither does /dev/null, which we
> use similarly.

Sensible.

>>  - We show on the "index" header "0*" value for these entries, even
>>    though we should be able to compute it (after all we do so for
>>    files on disk in a non-git directory);
>
> The index_mem I mentioned above would fix that.

Yes.

>>  - We still apply attributes and textconv as if we are dealing with
>>    a regular file "-" at the root level.
>
> I think that's bad. I wonder if it should have "*" attributes applied to
> it or not. While I can see it being convenient in some cases, I think it
> makes the rules confusingly complex.

It is likely that the crlf conversion on Dos systems wants to be
applied regardless.

This is unrelated to the "standard input confusion" issue, but there
are two more things coming either from the way the no-index code is
done or from the way it is bolted onto git.

With the current code, this:

	mkdir foo bar
        echo hello >foo/1
        echo bye >bar/2
        git diff --no-index foo bar

shows differences between a/foo/1 and b/bar/1, as the no-index code
records foo/1 and bar/1 as the paths in the filespec for them.

But if you are comparing two directory hierarchies, it is a lot more
likely that you would want to see corresponding files in these two
directories.  In other words, the patch is better shown as
differences between a/1 and b/1 (the code makes the core compare
foo/1 and bar/2 after all).  This of course requires no-index to
differentiate the logical pathname (i.e. "this is the path inside
collection a/ (or b/)") and the physical location from which the
contents are read.  Such a differentiation would allow us to also do
renames and rename classifications much more sanely.  We had to add
DIFF_PAIR_RENAME() and filepair->renamed_pair only to support this
codepath because of this misdesign.  We could have just run strcmp()
between the logical pathname of one/two members of the filepair to
see if the pair was renamed if it was done that way.

And the way diff-no-index.c::queue_diff() walks two directories to
grab paths from them in parallel and incrementally means that the
filesystem walking code cannot be reused for something like this:

	git diff master:Documentation /var/tmp/docs

to compare a hierarchy represented with a tree object with another
hierarchy stored in the filesystem outside git's control.  A natural
way to do so would be to grab a set of paths from /var/tmp/docs and
have that set compared against the other set that comes from the tree,
and the "grab a set of paths from /var/tmp/docs" machinery can be
used twice to implement the current

	git diff --no-index /var/tmp/foo /var/tmp/bar

which would have been a lot cleaner implementation.

Oh well.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]