On Fri, Apr 15, 2022 at 6:28 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > On Fri, Apr 15, 2022 at 05:41:59PM -0700, Glen Choo wrote: > > * We want additional protection on the client besides `git fsck`; there are > > a few ways to do this: > > I'm a little late to chime into the thread, but I appreciate you > summarizing some of the approaches discussed so far. Let me add my > thoughts on each of these in order: > > > 1. Prevent checking out an embedded bare repo. > > 2. Detect if the bare repo is embedded and refuse to work with it. > > 3. Detect if the bare repo is embedded and do not read its config/hooks, but > > everything else still 'works'. > > 4. Don't detect bare repos. > > 5. Only detect bare repos that are named `.git` [1]. > > > > (I've responded with my thoughts on each of these approaches in-thread). > > 1. Likely disrupts too many legitimate workflows for us to adopt > without designing some way to declare an embedded bare repository > is "safe". > 2. Ditto. > 3. This seems the most promising approach so far. Similar to (1), I > would also want to make sure we provide an easy way to declare a > bare repository as "safe" in order to avoid permanently disrupting > valid workflows that have accumulated over the past >15 years. I'd like to think a little more about how we want to determine that a bare repo isn't embedded - wantonly scanning up the filesystem for any gitdir above the current one is really expensive. When I tried that approach for the purposes of including some shared config between superproject and submodules, it slowed down the Git test suite by something like 3-5x. So, I suppose that "we think this is bare" means refs/ and objects/ present in a directory that isn't named '.git/', and then we hunt anywhere above us for another .git/? Of course being careful not to accept another bare repo as the "parent" gitdir. Does it work for submodules? I suppose it works for non-bare submodules - and for bare submodules, e.g. 'submodule-having-project.git/modules/some-submodule/' we can rely on the user to turn that flag on if they want their submodule's config and hooks to be noticed from the gitdir. (From 'worktree-for-submodule-having-project/some-submodule' there is a '.git' pointer, so I'd expect things to work normally that way, right?) As long as we are careful to avoid searching the filesystem in *every* case, this seems like a pretty reasonable approach to me. > 4. Seems like this approach is too heavy-handed. > 5. Ditto. > > > With that in mind, here's what I propose we do next: > > > > * Merge the `git fsck` patch [2] if we think that it is useful despite the > > potentially huge number of false positives. That patch needs some fixing; I'll > > make the changes when I'm back. > > If there are lots of false positives, we should consider downgrading the > severity of the proposed `EMBEDDED_BARE_REPO` fsck check to "info". I'm > not clear if there is another reason why this patch would have a > significant number of false positives (i.e., if the detection mechanism > is over-zealous). > > But if not, and this does detect only legitimate embedded bare > repositories, we should use it as a reminder to consider how many > use-cases and workflows will be affected by whatever approach we take > here, if any. > > > * I'll experiment with (5), and if it seems promising, I'll propose this as an > > opt-in feature, with the intent of making it opt-out in the future. We'll > > opt-into this at Google to help figure out if this is a good default or not. > > > > * If that direction turns out not to be promising, I'll pursue (1), since that > > is the only option that can be configured per-repo, which should hopefully > > minimize the fallout. > > Here's an alternative approach, which I haven't seen discussed thus far: > > When a bare repository is embedded in another repository, avoid reading > its config by default. This means that most Git commands will still > work, but without the possibility of running any "executable" portions > of the config. To opt-out (i.e., to allow legitimate use-cases to start > reading embedded bare repository config again), the embedding repository > would have to set a multi-valued `safe.embeddedRepo` configuration. This > would specify a list of paths relative to the embedding repository's > root of known-safe bare repositories. > > I think there are a couple of desirable attributes of this approach: > > - It minimally disrupts bare repositories, restricting the change to > only embedded repositories. > - It allows most Git commands to continue working as expected (modulo > reading the config), hopefully making the population of users whose > workflows will suddenly break pretty small. > - It requires the user to explicitly opt-in to the unsafe behavior, > because an attacker could not influence the embedding repository's > `safe.embeddedRepo` config. > > If we were going to do something about this, I would strongly advocate > for something that resembles the above. Or at the very least, some > solution that captures the attributes I outlined there. Nice - a more strict spin on proposal 3 from above, if I understand it right. Rather than allowing any and all bare repos, avoid someone sneaking in a malicious one next to legitimate ones by using an allowlist. Seems reasonable to me. - Emily