On Tue, May 9, 2017 at 2:37 AM, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > On Tue, May 09, 2017 at 02:00:18AM +0200, Ævar Arnfjörð Bjarmason wrote: >> On Tue, May 9, 2017 at 1:32 AM, brian m. carlson >> <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: >> > PCRE and PCRE2 also tend to have a lot of security updates, so I would >> > prefer if we didn't import them into the tree. It is far better for >> > users to use their distro's packages for PCRE, as it means they get >> > automatic security updates even if they're using an old Git. >> > >> > We shouldn't consider shipping anything with a remotely frequent history >> > of security updates in our tree, since people very frequently run old or >> > ancient versions of Git. >> >> I'm aware of its security record[1], but I wonder what threat model >> you have in mind here. I'm not aware of any parts of git (except maybe >> gitweb?) where we take regexes from untrusted sources. >> >> I.e. yes there have been DoS's & even some overflow bugs leading code >> execution in PCRE, but in the context of powering git-grep & git-log >> with PCRE this falls into the "stop hitting yourself" category. > > Just because you don't drive Git with untrusted regexes doesn't mean > other people don't. It's not a good idea to require a stronger security > model than we absolutely have to, since people can and will violate it. > Think how devastating Shellshock was even though technically nobody > should provide insecure environment variables to the shell. Yes this is definitely something we should keep in mind. I see my comment could be read as dismissing your concerns, I didn't mean that. This is definitely something to be concerned about. I was trying to solicit feedback on whether there were any other parts of stock git that could take external input as regexes than gitweb, I'm not aware of any. I thought that gitweb wouldn't take regexes by default, but I see now that the 'search' feature is on by default, and that allows you to grep / pickaxe with regexes. Either -E or -F for grep, or --pickaxe-regex (which implies -E) for log. > And, yes, gitweb does in fact call git grep. That means that git grep > must in fact be secure against untrusted regexes, or you have a remote > code execution vulnerability. Indeed, but it's not as clear-cut as turning on on PCRE making you vulnerable. * gitweb is vulnerable to CPU DoS now in its default configuration. It's easy to provide an ERE that ends up slurping up 100% CPU for several seconds on any non-trivial sized repo, do that in parallel & you have a DoS vector. * Obviously something that's 2x as fast or more (which my WIP code is) makes this better. * PCRE tends to be worse at pathological patterns, but this is because it has really large limits by default and will try rather insanely hard to match your pattern. -DHEAP_LIMIT=20000000 \ -DMATCH_LIMIT=10000000 \ -DMATCH_LIMIT_DEPTH=10000000 \ -DMAX_NAME_COUNT=10000 \ -DMAX_NAME_SIZE=32 \ -DPARENS_NEST_LIMIT=250 \ This can be reduced dynamically via the API (and the patterns can't change it, except to reduce it). For example on my system 2.11.0 (before any of my changes) on linux.git: $ time git grep -E -- '-?-?-?-?-?-?-?-?-?-?-?-----------$' |wc -l 12434 real 0m0.444s $ time git grep -P -- '-?-?-?-?-?-?-?-?-?-?-?-----------$' | wc -l 12434 real 1m20.849s With my JIT changes: $ time ~/g/git/git --exec-path=/home/avar/g/git grep -P -- '-?-?-?-?-?-?-?-?-?-?-?-----------$' | wc -l 12434 real 0m5.334s However for user-supplied patterns we can just turn on really conservative settings: $ time ~/g/git/git --exec-path=/home/avar/g/git grep -P -- '(*LIMIT_RECURSION=1000)(*LIMIT_MATCH=1000)-?-?-?-?-?-?-?-?-?-?-?-----------$' | wc -l fatal: pcre2_jit_match failed with error code -47: match limit exceeded 0 real 0m0.021s When I locally compile git with something like this: - -DMATCH_LIMIT=10000000 \ - -DMATCH_LIMIT_DEPTH=10000000 \ - -DMATCH_LIMIT_RECURSION=10000000 \ - -DMAX_NAME_COUNT=10000 \ + -DMATCH_LIMIT=1000 \ + -DMATCH_LIMIT_DEPTH=1000 \ + -DMATCH_LIMIT_RECURSION=1000 \ + -DMAX_NAME_COUNT=100 \ -DMAX_NAME_SIZE=32 \ - -DPARENS_NEST_LIMIT=250 \ + -DPARENS_NEST_LIMIT=10 \ All tests still pass, and from my own testing I can't find any non-pathological patterns this would break. So we might actually consider turning this down for git by default. * Much of the rest of the patterns PCRE has CVEs for can similarly be mitigated by simply turning various features off. > Furthermore, at work we distribute Git with all releases of our product. > We normally only do non-security updates to the last couple of releases, > but we provide security updates to all supported versions. I'm not > comfortable shipping the entirety of PCRE or PCRE2 to customers without > providing security updates, so you're going to make my job (and my > coworkers') a lot harder by shipping it. Please don't. I'm not going to make your job harder. This WIP patch I have for embedding PCRE2 as compat/pcre2 is for use by those who'd want to get a faster grep, but for whatever reason don't have a handy PCRE v2 package already, or wouldn't mind supporting a git that comes bundled with it. My comments about "git rm"-ing kwset & compat/regex were meant in the sense of "I have a branch where we don't use that anymore, and it's faster", not that you couldn't compile a git without PCRE for the forseeable future. My WIP branch does remove much of the non-PCRE code, but that's just because it was easier to hack up like that and not support a bunch of if pcre/else branches, I'll amend that and keep PCRE optional when I end up submitting those patches.