Hi Glen, This is an interesting find. Thanks for writing it up. On 6 Apr 2022, at 18:43, Glen Choo wrote: > Hi all, > > My Google colleagues and I would like to address what we think is a security > risk - namely that Git repositories can contain valid bare repositories in their > working trees. This is based on an excellent article by Justin Steven [1] (CC-ed > here). > > Below is a discussion of: > > * The risky behavior - what Git does and why it is risky > * Possible responses to this risk > * A proposed approach > > The proposed changes are nontrivial, so I’d really appreciate any feedback here. > Unfortunately, I will be out of the office and won’t respond to emails for the > next 7 days or so, but there will still be at least one Google Git team member > keeping tabs on the discussion :) > > = TL;DR > > Git repositories should not be allowed to contain bare repositories in their > working trees because: > > * Such bare repositories can have maliciously crafted config files that cause > `git` to execute arbitrary code. > * Git itself can distribute the malicious repo via `git clone`; no need to > distribute repos out of band e.g. via tarballs [2]. > * Many `git` commands can be affected by malicious config files, and many users > have tools that will run `git` in the current directory or the subdirectories > of a repo. Once the malicious repo has been cloned, very little social > engineering is needed; the user might only need to open the repo in an editor > or `cd` into the correct subdirectory. > > = Background > > (This section is primarily a summary of [1]. I highly, highly recommend reading > that as it describes the issue in much more detail and is extremely readable > regardless of Git experience.) > > Certain Git configuration options are particularly enticing targets for > attackers, e.g. core.fsmonitor can execute arbitrary commands and is invoked > on many innocuous-looking `git` commands (like `git status`). This is even more > risky when one considers that many tools (like shell prompts and IDEs) will run > `git` opportunistically inside a repository - so many users won't even need to > explicitly run `git` to be affected [3]. > > Since config files are such an enticing target for attackers, Git intentionally > avoids distributing config files with repos - a user shouldn't be able to `git > clone` a repo with a config file (or really, any files inside .git). However, > one can 'trick' Git into doing this by embedding a bare repository inside of > another, containing repository: a repository can contain subdirectories that are > valid bare repositories, and any `git` operations run in such a subdirectory > will then use the bare repository’s config file instead of the "main" > repository’s. > > An attack might look like this: > > * Attacker creates a repository where subdirectory "Documentation/" is a bare > repository i.e. it contains "HEAD", "refs/" and "objects/" [4]. Attacker > also adds "config" with a malicious setting for core.fsmonitor. > * Attacker convinces User to read their project's documentation by `git > clone`-ing their repository and inspecting the "Documentation/" directory. > * User cd-s into "Documentation/" and their shell prompt runs `git status`, This might be obvious to most reading this, but for those trying this out at home, you'll need to create a worktree that points to the bare repository and then run `git status` in there. > executing the core.fsmonitor command defined by Attacker. This is easily reproduceable. > > = What can we do about it? > > Each subsection is an alternative and an analysis (+/- are pros/cons). > > == 1. Prevent users from checking out bare repos > > This is similar to an existing mitigation where we prevent repository entries > that can be confused for ".git" (like ".GIT"). but it requires checking multiple > entries instead of a single entry. I suspect that we could accomplish this in > one of two ways: > > a. Prevent bare repos from entering the index. > b. Prevent writing bare repos to the working tree. > > + Relatively robust protection - because the malicious repo never enters the > working tree, we even protect other tools (e.g. JGit) from doing dangerous > things in the embedded repo (provided the checkout is done with `git`, of > course). > - This breaks some 'valid' workflows (e.g. someone embedding a bare repo as a > more convenient alternative to submodules), but it seems reasonable to let > users opt out of this behavior. > - (1a) is difficult to do in practice because many code paths add entries to > the index, and checking a combination of new entry and existing entries is > much trickier than checking just the new entry. > - (1b) might also be difficult, though not as difficult as 1a because we > already have a complete list of entries we will write. I don’t think there > are existing facilities that do this sort of checking of multiple entries. > > == 2. Detect and reject bare repos using `git fsck` and `transfer.fsckObjects`. > > This entails checking for the markers of a bare repository (HEAD, refs/, > objects/) in tree objects. This shares a precedent with (1), since `git fsck` > will also detect ".GIT". > > + Most reputable hosting sites set `transfer.fsckObjects`, which allows them to > detect and prevent this kind of transmission. > + Confers some protection to users even without them doing anything. > + Easy to visualize and to write. > - This won’t directly protect most users because they don’t typically set > `transfer.fsckObjects` (in fact, `transfer.fsckObjects` will render many > projects, like linux.git, uncloneable without additional config) > - Won’t guard against malicious/poorly configured hosts. > > == 3. Detect that we are in an embedded bare repo and ignore the embedded bare > repository in favor of the containing repo. > > For example, if setup.c detects that we are in a bare repo, it could then walk > up the directory hierarchy to see if there is a containing repo that tracks the > bare one. If so, setup.c chooses to use the containing repo instead. > > + Extremely robust; this even protects against a checkout by an earlier Git > version. > + Users who don’t use bare repos won’t even notice the difference. > - The change in rules breaks some legitimate workflows e.g. a user with a repo > at HOME and bare repos underneath. > - Potentially very expensive for bare repo users because setup.c will likely > walk up to the root directory; we’d be essentially *forcing* those users to > set GIT_CEILING_DIRECTORIES. > - Doesn’t protect users who use other tools e.g. JGit. > > == 4. Educate users about this risk without making code changes. > > Some risks fall into this category e.g. "Avoid unarchiving repositories because > .git/config might be poisoned." [2]. > > + We don’t break any users. > - Breaks the trust that users have in `git clone`. > - IMO "Inspect every repo in its entirety before cloning it" is too much of a > burden to put on users. > > = Next steps > > I propose that we prevent repositories from containing bare repositories by > doing the following (in order): > > * Implement (2) by adding a new fsck message "embeddedBareRepo". > * When this is done, hosting sites can hopefully use this capability to > prevent transmission, and help us understand the prevalence of such attacks. > * Implement (1b) by teaching unpack_trees.c to check whether the tree contains > an entire bare repo, and die() if so. This will be guarded by a > defaults-to-true config value. > * This would only block a bare repo from being written in a single operation. > It wouldn’t stop a user from writing a bare repo entry-by-entry using "git > checkout <path>", but the amount of social engineering required probably > renders this attack infeasible. > * As I noted earlier, I foresee some difficulty actually implementing this > because I don’t think we have facilities for checking multiple tree entries > at once. > > I am particularly interested in hearing feedback about (1b), namely: > > * How to actually teach unpack_trees.c to detect bare repos. > * Whether preventing bare repos in unpack_trees.c is a good enough mitigation > (e.g. Are there other code paths that should block bare repos? Should we be > checking the index instead of the tree?). > > I have a patch that does (2); I will send that out and we can leave feedback on > that separately. > > = Demonstration > > This is based on a script by Taylor Blau (thanks!). [1] also contains a > demonstration that runs in Docker. > > #!/bin/sh > > rm -fr malicious cloned &&git init malicious && > > ( > cd malicious && > > mkdir -p bare && > cd bare && > > echo 'ref: refs/heads/main' >HEAD && > cat >config <<-EOF > [core] > repositoryformatversion = 0 > filemode = true > bare = false > worktree = "worktree" > fsmonitor = "echo pwned >&2; false" > EOF > > mkdir objects refs worktree && > touch worktree/.gitkeep && > > git add . && > git commit -m ".gitkeep" && > > cd .. && > git add bare && > git commit -m 'initial commit' > ) && > > git clone --no-local malicious cloned && > cd cloned/bare && > git status # pwned > > = Footnotes > > [1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md. > [2] Archived repositories with malicious .git directories are a known risk. See > https://lore.kernel.org/git/20171003123239.lisk43a2goxtxkro@xxxxxxxxxxxxxxxxxxxxx/ > for an on-list discussion. This is also described in > https://blog.sonarsource.com/securing-developer-tools-git-integrations > (referenced in [1]). > [3] We even ship such a tool - contrib/completion/git-prompt.sh. A user can pwn > themselves with tab completion even if they don’t have a prompt configured > [4] Any directory containing these entries will be recognized as a bare repo, so > an attacker could add arbitrary entries to the directory to obfuscate the fact > that it is a bare repo. > [*] https://offensi.com/2019/12/16/4-google-cloud-shell-bugs-explained-bug-3/ is > similar to [1], but uses hooks instead of core.fsmonitor.