Hi all, My Google colleagues and I would like to address what we think is a security risk - namely that Git repositories can contain valid bare repositories in their working trees. This is based on an excellent article by Justin Steven [1] (CC-ed here). Below is a discussion of: * The risky behavior - what Git does and why it is risky * Possible responses to this risk * A proposed approach The proposed changes are nontrivial, so I’d really appreciate any feedback here. Unfortunately, I will be out of the office and won’t respond to emails for the next 7 days or so, but there will still be at least one Google Git team member keeping tabs on the discussion :) = TL;DR Git repositories should not be allowed to contain bare repositories in their working trees because: * Such bare repositories can have maliciously crafted config files that cause `git` to execute arbitrary code. * Git itself can distribute the malicious repo via `git clone`; no need to distribute repos out of band e.g. via tarballs [2]. * Many `git` commands can be affected by malicious config files, and many users have tools that will run `git` in the current directory or the subdirectories of a repo. Once the malicious repo has been cloned, very little social engineering is needed; the user might only need to open the repo in an editor or `cd` into the correct subdirectory. = Background (This section is primarily a summary of [1]. I highly, highly recommend reading that as it describes the issue in much more detail and is extremely readable regardless of Git experience.) Certain Git configuration options are particularly enticing targets for attackers, e.g. core.fsmonitor can execute arbitrary commands and is invoked on many innocuous-looking `git` commands (like `git status`). This is even more risky when one considers that many tools (like shell prompts and IDEs) will run `git` opportunistically inside a repository - so many users won't even need to explicitly run `git` to be affected [3]. Since config files are such an enticing target for attackers, Git intentionally avoids distributing config files with repos - a user shouldn't be able to `git clone` a repo with a config file (or really, any files inside .git). However, one can 'trick' Git into doing this by embedding a bare repository inside of another, containing repository: a repository can contain subdirectories that are valid bare repositories, and any `git` operations run in such a subdirectory will then use the bare repository’s config file instead of the "main" repository’s. An attack might look like this: * Attacker creates a repository where subdirectory "Documentation/" is a bare repository i.e. it contains "HEAD", "refs/" and "objects/" [4]. Attacker also adds "config" with a malicious setting for core.fsmonitor. * Attacker convinces User to read their project's documentation by `git clone`-ing their repository and inspecting the "Documentation/" directory. * User cd-s into "Documentation/" and their shell prompt runs `git status`, executing the core.fsmonitor command defined by Attacker. = What can we do about it? Each subsection is an alternative and an analysis (+/- are pros/cons). == 1. Prevent users from checking out bare repos This is similar to an existing mitigation where we prevent repository entries that can be confused for ".git" (like ".GIT"). but it requires checking multiple entries instead of a single entry. I suspect that we could accomplish this in one of two ways: a. Prevent bare repos from entering the index. b. Prevent writing bare repos to the working tree. + Relatively robust protection - because the malicious repo never enters the working tree, we even protect other tools (e.g. JGit) from doing dangerous things in the embedded repo (provided the checkout is done with `git`, of course). - This breaks some 'valid' workflows (e.g. someone embedding a bare repo as a more convenient alternative to submodules), but it seems reasonable to let users opt out of this behavior. - (1a) is difficult to do in practice because many code paths add entries to the index, and checking a combination of new entry and existing entries is much trickier than checking just the new entry. - (1b) might also be difficult, though not as difficult as 1a because we already have a complete list of entries we will write. I don’t think there are existing facilities that do this sort of checking of multiple entries. == 2. Detect and reject bare repos using `git fsck` and `transfer.fsckObjects`. This entails checking for the markers of a bare repository (HEAD, refs/, objects/) in tree objects. This shares a precedent with (1), since `git fsck` will also detect ".GIT". + Most reputable hosting sites set `transfer.fsckObjects`, which allows them to detect and prevent this kind of transmission. + Confers some protection to users even without them doing anything. + Easy to visualize and to write. - This won’t directly protect most users because they don’t typically set `transfer.fsckObjects` (in fact, `transfer.fsckObjects` will render many projects, like linux.git, uncloneable without additional config) - Won’t guard against malicious/poorly configured hosts. == 3. Detect that we are in an embedded bare repo and ignore the embedded bare repository in favor of the containing repo. For example, if setup.c detects that we are in a bare repo, it could then walk up the directory hierarchy to see if there is a containing repo that tracks the bare one. If so, setup.c chooses to use the containing repo instead. + Extremely robust; this even protects against a checkout by an earlier Git version. + Users who don’t use bare repos won’t even notice the difference. - The change in rules breaks some legitimate workflows e.g. a user with a repo at HOME and bare repos underneath. - Potentially very expensive for bare repo users because setup.c will likely walk up to the root directory; we’d be essentially *forcing* those users to set GIT_CEILING_DIRECTORIES. - Doesn’t protect users who use other tools e.g. JGit. == 4. Educate users about this risk without making code changes. Some risks fall into this category e.g. "Avoid unarchiving repositories because .git/config might be poisoned." [2]. + We don’t break any users. - Breaks the trust that users have in `git clone`. - IMO "Inspect every repo in its entirety before cloning it" is too much of a burden to put on users. = Next steps I propose that we prevent repositories from containing bare repositories by doing the following (in order): * Implement (2) by adding a new fsck message "embeddedBareRepo". * When this is done, hosting sites can hopefully use this capability to prevent transmission, and help us understand the prevalence of such attacks. * Implement (1b) by teaching unpack_trees.c to check whether the tree contains an entire bare repo, and die() if so. This will be guarded by a defaults-to-true config value. * This would only block a bare repo from being written in a single operation. It wouldn’t stop a user from writing a bare repo entry-by-entry using "git checkout <path>", but the amount of social engineering required probably renders this attack infeasible. * As I noted earlier, I foresee some difficulty actually implementing this because I don’t think we have facilities for checking multiple tree entries at once. I am particularly interested in hearing feedback about (1b), namely: * How to actually teach unpack_trees.c to detect bare repos. * Whether preventing bare repos in unpack_trees.c is a good enough mitigation (e.g. Are there other code paths that should block bare repos? Should we be checking the index instead of the tree?). I have a patch that does (2); I will send that out and we can leave feedback on that separately. = Demonstration This is based on a script by Taylor Blau (thanks!). [1] also contains a demonstration that runs in Docker. #!/bin/sh rm -fr malicious cloned &&git init malicious && ( cd malicious && mkdir -p bare && cd bare && echo 'ref: refs/heads/main' >HEAD && cat >config <<-EOF [core] repositoryformatversion = 0 filemode = true bare = false worktree = "worktree" fsmonitor = "echo pwned >&2; false" EOF mkdir objects refs worktree && touch worktree/.gitkeep && git add . && git commit -m ".gitkeep" && cd .. && git add bare && git commit -m 'initial commit' ) && git clone --no-local malicious cloned && cd cloned/bare && git status # pwned = Footnotes [1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md. [2] Archived repositories with malicious .git directories are a known risk. See https://lore.kernel.org/git/20171003123239.lisk43a2goxtxkro@xxxxxxxxxxxxxxxxxxxxx/ for an on-list discussion. This is also described in https://blog.sonarsource.com/securing-developer-tools-git-integrations (referenced in [1]). [3] We even ship such a tool - contrib/completion/git-prompt.sh. A user can pwn themselves with tab completion even if they don’t have a prompt configured [4] Any directory containing these entries will be recognized as a bare repo, so an attacker could add arbitrary entries to the directory to obfuscate the fact that it is a bare repo. [*] https://offensi.com/2019/12/16/4-google-cloud-shell-bugs-explained-bug-3/ is similar to [1], but uses hooks instead of core.fsmonitor.