Hi Junio & Jonathan, On Thu, 26 Dec 2019, Junio C Hamano wrote: > Junio C Hamano <gitster@xxxxxxxxx> writes: > > > Jonathan Nieder <jrnieder@xxxxxxxxx> writes: > > > >> Is there anything we can or should do to prevent people checking in > >> new examples of paths with backslash in them (on all platforms)? > > > > I obviously won't dictate what should happen on Windows, but I think > > the overall principle for paths recorded in a tree object that can > > be problematic on some of the platforms ought to be: > > > > * fsck and transfer.fsckobjects should be taught to notice > > offending characteristics (e.g. has a backslash in it, is one of > > the "reserved names" on some platform like LPT1). Agree. This is on my radar, but so far not too-high priority, as the `fsck` checks are not (yet?) standard practice in PR builds (and warnings on the server side are prone to be ignored). > > * if paths with the offending characteristics are *so* obviously > > useless in real life and are possible only in a crafted path that > > is only useful to attack users, the check in fsck should default > > to "reject" to help the disease spread via hosting sites. I don't think that reserved names such as `aux`, nor names containing backslashes should be rejected _always_. While I cannot think of _any_ instance where I would want to have a backslash in a file name, I am sure that just like `aux.c`, there _must_ be somebody out there who thought of a file name that contains a backslash and makes at least some sort of sense. > > * otherwise, the check should be to "warn" but not "reject", so > > that projects can keep using paths that may problematic on > > platforms that do not matter to them. Yes, it should be "warn". > > I think LPT1 and friends fall into the "warning is fine" category, > > and a path component that contains a backslash would fall into the > > "this is an attack, just reject" category. > > I guess I should have stepped back a bit. > > In the message I am responding to, I focused solely on how tree > objects that originate elsewhere should be treated, but there are > two more data paths we need to worry about: > > * A new path gets introduced to the system, via "update-index", > "add", etc., to the index and then "write-tree" to form tree > objects in the local repository. Right, that's what I had in mind when I wrote this patch. The path gets added to the index, we detect a backslash, and on Windows (under `core.protectNTFS`) fail with an error. > * A path in the index, either created locally via "update-index", > "add", etc., or read from a tree object, gets written to the > local filesystem, resulting in an attempt to create a path that > the local filesystem cannot represent (or worse---behaves badly, > like "sending random garbage to the printer"). Happily, my patch seems to catch this code path, too: when reading from a `tree` object into the index, we use `add_index_entry()` (called via various code paths in the `unpack_trees()` machinery). That's exactly the patched function. Or maybe you know of a code path in the `unpack_trees()` machinery that does _not_ go through `add_index_entry()`? I would be very interested to learn about such code paths. > I think we should apply the same principle as the one I outlined for > the tree objects. The fsckobjects mechanism may not be reusable to > catch violations in add_index_entry_with_check() as-is, but we need > to aim for as much reuse of logic and code as possible so that our > checks at various layers all behave consistently. I am afraid that we won't be able to reuse code paths for checking the backslash here, but for reserved names I am planning on refactoring the code accordingly. Thanks, Dscho