On Fri, Jul 01 2022, Glen Choo wrote: Sorry for the late reply, I see there's a v7, but this seems to also apply to it, so... > Thanks for weighing in :) Despite the different proposed approaches, I > think we actually are in broad agreement. > > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: > >> On Thu, Jun 30 2022, Glen Choo via GitGitGadget wrote: >> >>> This is a quick re-roll to address Ævar's comments on the tests (thanks!). >> >> Thanks! >> >>> = Description >> >> Just more generally on this series & approach. I know this is a v6 by >> now, but I haven't kept up with this topic, but to be fair I did mention >> pretty much this in: >> https://lore.kernel.org/git/220407.86lewhc6bz.gmgdl@xxxxxxxxxxxxxxxxxxx/ >> >> So... >> >>> There is a known social engineering attack that takes advantage of the fact >>> that a working tree can include an entire bare repository, including a >>> config file. A user could run a Git command inside the bare repository >>> thinking that the config file of the 'outer' repository would be used, but >>> in reality, the bare repository's config file (which is attacker-controlled) >>> is used, which may result in arbitrary code execution. See [1] for a fuller >>> description and deeper discussion. >>> >>> This series implements a simple way of preventing such attacks: create a >>> config option, discovery.bare, that tells Git whether or not to die when it >>> finds a bare repository. discovery.bare has two values: >>> >>> * "always": always allow bare repositories (default), identical to current >>> behavior >>> * "never": never allow bare repositories >>> >>> and users/system administrators who never expect to work with bare >>> repositories can secure their environments using "never". discovery.bare has >>> no effect if --git-dir or GIT_DIR is passed because we are confident that >>> the user is not confused about which repository is being used. >> >> I'm not insisting that the entire approach here should be changed, but >> in the above exchange you seemed to have performance concerns about the >> "just walk up in setup.c" approach I mentioned, but it's not clear if >> that's still the only thing that necessitates taking this approach. >> >> There may be security subtleties that I've missed, but from the >> description here it seems like that would work equally well, and >> wouldn't require configuration, except insofar as we'd need to opt-in to >> reading config from bare repositores *that also exist in a parent tree*. >> >> And it would be a more narrow & more secure solution, since it would >> e.g. allow you to intentionally navigate to /var/repos/git/git.git in a >> server setup and read the config there, which it could distinguish from >> a case of /var/repos/.git existing, and git/git.git being brought in as >> a part of that "parent" repo. > > Performance is one major concern, yes, and I agree that your findings > show that the "just walk up" approach is cheap enough to consider doing. > Though in the few cases where it isn't cheap to walk, wouldn't it still > be useful to be able to opt out of it? Maybe, but until we at least have a reason to think that there is a performance concern this all seems rather hypothetical. In the thread(s) I linked to I noted that I tested my POC implementation on AIX, which has a sloooooooow filesystem. Why are you concerned about this having a performance impact? > The other concern is simplicity and correctness. Are we confident that > we'll get the design of "just walk up" correct (including edge cases > like "bare repo in bare repo in non bare repo")? I'm 100% confident that > we'll get it right eventually, and that this approach will be a good > default for all users. But in comparison, "never" is so much easier to > understand and implement that I don't see why we shouldn't start by > presenting this option to the 0.1-1% of users who would find it useful. If you run "git status" or whatever in a directory anywhere on your FS now it'll confidently tell you if you're not within a git repository. We're just talking about re-using that code, why would we be concerned that: 1. Finding a repo at a given <PATH> 2. Appending "/.." to that <PATH> 3. Feeding it to a "given a path, find me the git repo" (code setup.c already has) Is something we'd get wrong? > And on the topic of simplicity, there's significant interest in > maintaining backwards-compatibility with repos with workflows that > absolutely depend on embedded bare repos, e.g. libgit2 and Git-LFS. > That's yet another special case that we'd have to get right. Stolee's > "no-embedded" proposal [1] pretty much covers that, but I don't see the > harm in simplifying the design space by making bare repo support a > non-goal. > > [1] https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@xxxxxxxxxx Sure, and we could have a config know to still use such repos (for config or otherwise) with the "walk up" method, does that address your concern here? >> The "more narrow" and "more secure" go hand-in-hand, since if you work >> on such servers you'd turn this to "always" because you want to read >> such config, but then be left vulnerable to the actual (and muche rarer) >> exploit we're trying to prevent. > > The point that we're not defending bare repo users is fair, but maybe > the group we're trying to protect isn't really dedicated Git-serving > servers. This exploit requires you to have a bare repo inside the > working tree of a non-bare repo. So I think this is less of an issue for > a server, and more for "mixed-use" environments with both regular and > bare clones. Yes, but this is only something that's even a question because of an artificial limitation your proposal here suffers from. I.e. in trying to detect nefarious repos where you've got "looks like bare" content *tracked* in another repo you're conflating it with *any bare repo*. And the only reason we're doing that seems to me to be a premature optimization. >> Which, it seems... >> >>> This series does not change the default behavior, but in the long-run, a >>> "no-embedded" option might be a safe and usable default [2]. "never" is too >>> restrictive and unlikely to be the default. >> >> This series has (since v3?) been noting aspirations to have a >> "no-embedded" variant of this config, which your 5/5 here notes would be >> better, but isn't implemented by this series. >> >> But your 5/5 also notes: >> >> but detecting if a repository is embedded is potentially >> non-trivial, so this work is not implemented in this series. >> >> Hrm, well, the diff-stat isn't quite that trivial either :) : > > Well.. a lot of it is refactoring :P > >>> [...] >>> upload-pack.c | 27 ++++++---- >>> 12 files changed, 304 insertions(+), 47 deletions(-) >> >> In threads linked from the above ML link I linked to some POC code >> showing how to hack a second .git discovery walk into setup.c. This was >> as part of the "submodule parent dir" proposal, which is a different >> feature, but also needs such "find the parent" code: >> https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@xxxxxxxxxxxxxxxxxxx/ >> >> Now, obviously that's a dirty hack, but it's not that hard to just >> change the part of setup.c where we're satisfied that we've found the >> git dir, then walk up "$THAT_DIR/..", and start our search again. >> >> Then: >> >> if (first_dir_was_bare() && found_parent_dir()) >> enforce_no_embedded(); >> >> Isn't that what your proposed "no embedded" option would need to do? >> Well, maybe we'd also check if the "first dir" is in the index of the >> parent, as opposed to just being a bare .git somewhere in ~/Downloads, >> e.g. if you have a ~/.git and keep your dot-files in git. >> >> But I think for an initial implementation just doing the walk would be >> good enough, and would have a more narrow scope than this configuration >> setting. > > A narrow scope is good, but I don't agree on this definition of > "narrow". My preference is to give an obvious solution to a 'narrow' > group of users, instead of a more tricky solution that affects all users > in a 'narrow' set of cases. We could still have an option to say "I never want to consider bare repos from config", but if we're able to distinguish *tracked* bare repos from non-tracked ones this is entirely unrelated to the initial stated motivation for this series, isn't it? I.e. to solve the security issue at hand. I've got nothing against *also* providing a "I never want config from xyz repos", but that seems to be orthagonal. >> AFAICT the performance concerns aren't supported by any data, in the >> case of the "submodule superproject" feature it turned out to not be the >> directory walk, but us shelling out in a loop in git-submodule.sh. >> >> Well, *maybe* that's not the case, I think I have managed to read >> between the lines of some of these past exchanges that there's some odd >> propriterary internal NFS-like setup at Google where *parent dirs* are >> auto-mounted and searched on access, so a "walk up" pattern would be >> much more expensive. >> >> I do worry a bit about us ending up with design choices in git that we >> wouldn't have ended up with, if not to cater to some in-house setup >> somwhere that 99.99% of git users will never see. > > At the very least, I don't think you're saying that it's a bad idea to > have "never", just that we might not have come up with it if not for > some Google NFS thing. I'm just speculating as to why we've ended up with this approach, but maybe I'm wrong. > Another use case I can think of is CI bots, which have no need for bare > repos. To some folks (maybe in very security-sensitive environments), > "never" might give more peace of mind than "no-embedded". Sure, but again, I've got nothing against having *more config knobs*, and I've often e.g. wanted some way to tell git to read no config at all (which you can sort of do with the right environment variables, but it's a hassle). But I don't see how that's not unrelated to addressing the issue this series aims to address, and what I'm pointing out that it's doing so with a method that's less accurate than the "walk up" method, and less secure.