Re: [PATCH 1/1] mingw: only test index entries for backslashes, not tree entries

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Tue, 31 Dec 2019 23:51:35 +0100 (CET)

Hi Junio & Jonathan,

On Thu, 26 Dec 2019, Junio C Hamano wrote:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
> > Jonathan Nieder <jrnieder@xxxxxxxxx> writes:
> >
> >> Is there anything we can or should do to prevent people checking in
> >> new examples of paths with backslash in them (on all platforms)?
> >
> > I obviously won't dictate what should happen on Windows, but I think
> > the overall principle for paths recorded in a tree object that can
> > be problematic on some of the platforms ought to be:
> >
> >  * fsck and transfer.fsckobjects should be taught to notice
> >    offending characteristics (e.g. has a backslash in it, is one of
> >    the "reserved names" on some platform like LPT1).

Agree. This is on my radar, but so far not too-high priority, as the
`fsck` checks are not (yet?) standard practice in PR builds (and warnings
on the server side are prone to be ignored).

> >  * if paths with the offending characteristics are *so* obviously
> >    useless in real life and are possible only in a crafted path that
> >    is only useful to attack users, the check in fsck should default
> >    to "reject" to help the disease spread via hosting sites.

I don't think that reserved names such as `aux`, nor names containing
backslashes should be rejected _always_. While I cannot think of _any_
instance where I would want to have a backslash in a file name, I am sure
that just like `aux.c`, there _must_ be somebody out there who thought of
a file name that contains a backslash and makes at least some sort of
sense.

> >  * otherwise, the check should be to "warn" but not "reject", so
> >    that projects can keep using paths that may problematic on
> >    platforms that do not matter to them.

Yes, it should be "warn".

> > I think LPT1 and friends fall into the "warning is fine" category,
> > and a path component that contains a backslash would fall into the
> > "this is an attack, just reject" category.
>
> I guess I should have stepped back a bit.
>
> In the message I am responding to, I focused solely on how tree
> objects that originate elsewhere should be treated, but there are
> two more data paths we need to worry about:
>
>  * A new path gets introduced to the system, via "update-index",
>    "add", etc., to the index and then "write-tree" to form tree
>    objects in the local repository.

Right, that's what I had in mind when I wrote this patch. The path gets
added to the index, we detect a backslash, and on Windows (under
`core.protectNTFS`) fail with an error.

>  * A path in the index, either created locally via "update-index",
>    "add", etc., or read from a tree object, gets written to the
>    local filesystem, resulting in an attempt to create a path that
>    the local filesystem cannot represent (or worse---behaves badly,
>    like "sending random garbage to the printer").

Happily, my patch seems to catch this code path, too: when reading from a
`tree` object into the index, we use `add_index_entry()` (called via
various code paths in the `unpack_trees()` machinery). That's exactly the
patched function.

Or maybe you know of a code path in the `unpack_trees()` machinery that
does _not_ go through `add_index_entry()`? I would be very interested to
learn about such code paths.

> I think we should apply the same principle as the one I outlined for
> the tree objects.  The fsckobjects mechanism may not be reusable to
> catch violations in add_index_entry_with_check() as-is, but we need
> to aim for as much reuse of logic and code as possible so that our
> checks at various layers all behave consistently.

I am afraid that we won't be able to reuse code paths for checking the
backslash here, but for reserved names I am planning on refactoring the
code accordingly.

Thanks,
Dscho