Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index

Thomas Guyot-Sionnest <tguyot@xxxxxxxxx> · Fri, 18 Sep 2020 12:34:48 -0400

Hi Taylor,

On Fri, 18 Sep 2020 at 10:36, Taylor Blau <me@xxxxxxxxxxxx> wrote:
> On Fri, Sep 18, 2020 at 07:32:56AM -0400, Thomas Guyot-Sionnest wrote:
> > A very handy way to pass data to applications is to use the <() process
> > substitution syntax in bash variants. It allow comparing files streamed
> > from a remote server or doing on-the-fly stream processing to alter the
> > diff. These are usually implemented as a symlink that points to a bogus
> > name (ex "pipe:[209326419]") but opens as a pipe.
>
> This is true in bash, but sh does not support process substitution with
> <().

Bash, ksh, zsh and likely any more moden shell. Other programming
languages also setup such pipes. It's much cleaner than creating temp
files and cleaning them up and in some cases faster too (I've ran
diff's like this over GB's of test data, it's very handy to remove
known patterns that would cause needless diffs).

> > +/* Check that file is - (STDIN) or unnamed pipe - explicitly
> > + * avoid on-disk named pipes which could block
> > + */
> > +static int ispipe(const char *name)
> > +{
> > +     struct stat st;
> > +
> > +     if (name == file_from_standard_input)
> > +             return 1;  /* STDIN */
> > +
> > +     if (!lstat(name, &st)) {
> > +             if (S_ISLNK(st.st_mode)) {
>
> I had to read this a few times to make sure that I got it; you want to
> stat the link itself, and then check that it links to a pipe.
>
> I'm not sure why, though. Do you want to avoid handling named FIFOs in
> the code below? Your comment that they "could block" makes me think you
> do, but I don't know why that would be a problem.

I'll admit the comment was written first and is a bit naive  - i'll
rephrase that. Yes you don't want to block on pipes like if you run a
"grep -R" on a subtree that has fifos - but as I coded this I realized
the obvious: git tracks symlinks name so the real bugger would be to
detect one as a pipe and try reading it instead or calling readlink().

> > +                     /* symlink - read it and check it doesn't exists
> > +                      * as a file yet link to a pipe */
> > +                     struct strbuf sb = STRBUF_INIT;
> > +                     strbuf_realpath(&sb, name, 0);
> > +                     /* We're abusing strbuf_realpath here, it may append
> > +                      * pipe:[NNNNNNNNN] to an abs path */
> > +                     if (!stat(sb.buf, &st))
>
> Statting sb.buf is confusing to me (especially when followed up by
> another stat right below. Could you explain?

The whole block is under lstat/S_ISLNK (see previous chunk), so the
path provided to us was a symlink.

Initially I looked at what differentiate these - mainly, stat() st_dev
- but that struct is os-specific, you'd want to check major(st_dev) ==
0 (at least on linux) and even if we knew how each os behaves, the
code isn't portable and would be a pain to support. Gnu's difftools
have very incomplete historical source code in git but there's
indications they have gotten rid of it too.

So what I'm doing instead is trying to resolve the link and see if the
destination exists (a clear no). Luckily strbuf_realpath does the
heavy lifting and leaves me with a real path to the file the symlink
points to (especially useful for relative links), which is bogus for
the special pipes we're interested in.

Then the block right after (not shown) do a stat() on the initial name
and return whenever it's a fifo or not (if it is, but the link is
broken, we know it's a special device).

Now you mention it, maybe I could do that stat first, rule this out
from the beginning... less work for the general case.

*untested*:

    if (!lstat(name, &st)) {
        if (!S_ISLNK(st.st_mode))
            return(0);
        if (!stat(name, &st)) {
            if (!S_ISFIFO(st.st_mode))
                return(0);

            /* We have a symlink that points to a pipe. If it's resolved
             * target doesn't really exist we can safely assume it's a
             * special file and use it */
            struct strbuf sb = STRBUF_INIT;
            strbuf_realpath(&sb, name, 0);
            /* We're abusing strbuf_realpath here, it may append
             * pipe:[NNNNNNNNN] to an abs path */
            if (stat(sb.buf, &st))
                return(1); /* stat failed, special one */
        }
    }
    return(0);

TL;DR - the conditions we need:

- lstat(name) == 0  // name exists
- islink(lstat(name))  // name is a symlink
- stat(name) == 0  // target of name is reachable
- isfifo(stat(name))  // Target of name is a fifo
- stat(realpath(readlink(name))) != 0  // Although we can reach it,
name's destination doesn't actually exist.

BTW is st/sb too confusing ? I took examples elsewhere in the code, I
can rename them if it's easier to read.

> > +test_expect_success 'diff --no-index can diff piped subshells' '
> > +     echo 1 >non/git/c &&
> > +     test_expect_code 0 git diff --no-index non/git/b <(cat non/git/c) &&
> > +     test_expect_code 0 git diff --no-index <(cat non/git/b) non/git/c &&
> > +     test_expect_code 0 git diff --no-index <(cat non/git/b) <(cat non/git/c) &&
> > +     test_expect_code 0 cat non/git/b | git diff --no-index - non/git/c &&
> > +     test_expect_code 0 cat non/git/c | git diff --no-index non/git/b - &&
> > +     test_expect_code 0 cat non/git/b | git diff --no-index - <(cat non/git/c) &&
> > +     test_expect_code 0 cat non/git/c | git diff --no-index <(cat non/git/b) -
> > +'
>
> Indeed this test fails (Git thinks that the HERE-DOC is broken, but I
> suspect it's just getting confused by the '<()'). This test (like almost
> all other tests in Git) use /bin/sh as its shebang. Does your /bin/sh
> actually point to bash?
>
> If you did want to test something like this, you'd need to source
> t/lib-bash.sh instead of t/test-lib.sh.

Thanks for the tip - indeed I think I ran the testsuite directly with
back, but the make test failed.

> Unrelated to the above comment, but there are a few small style nits
> that I notice:
>
>   - There is no need to run with 'test_expect_code 0' since the test is
>     marked as 'test_expect_success' and the commands are all in an '&&'
>     chain. (This does appear to be common style for others in t4053, so
>     you may just be matching it--which is fine--but an additional
>     clean-up on top to modernize would be appreciated, too).
>
>   - The cat pipe is unnecessary, and is also violating a rule that we
>     don't place 'git' on the right-hand side of a pipe (can you redirect
>     the file at the end instead?).

Cleanup, no pipelines (I read too fast / assumed last command was ok) - will do!

> Documentation/CodingGuidelines is a great place to look if you are ever
> curious about whether something is in good style.
>
> > +test_expect_success 'diff --no-index finds diff in piped subshells' '
> > +     (
> > +             set -- <(cat /dev/null) <(cat /dev/null)

Precautions/portability. The file names are somewhat dynamic (at least
the fd part...) this is to be sure I capture the names of the pipes
that will be used (assuming the fd's will be reallocated in the same
order which I think is fairly safe). An alternative is to sed "actual"
to remove known variables, but then I hope it would be reliable (and
can I use sed -r?). IIRC earlier versions of bash or on some systems a
temp file could be used for these - although it defeats the purpose
it's not a reason to fail....

I cannot develop this on other systems but I tested the pipe names on
Windows and Sunos, and also using ksh and zsh on Linux (zsh uses /proc
directly, kss uses lower fd's which means it can easily clash with
scripts if you don't use named fd's, but not our problem....)

Thanks,

Thomas