On Mon, Oct 31, 2022 at 03:45:47PM +0100, Patrick Steinhardt wrote: > Indeed, I'm not trying to say that this can lead to repository > corruption. If at all you can argue that this is more security-related. > Suppose an object is not reachable from any public reference and that > `allowAnySHA1InWant=false`. Then you could make these hidden objects > reachable by sending a packfile with an object that references the > hidden object. It naturally requires you to somehow know about the > object ID, so I don't think this is a critical issue. I'd have to double check, but isn't this all moot with the v2 protocol anyway? I didn't think it even respected allowAnySHA1InWant. Even that aside, there are other tricks you can do. E.g., pushing up an object which claims to be a delta of X (though you have the trick then of figuring out how to reference it), or pushing up a test object then fetching it back while claiming to have X, treating the server as an oracle which may give you a delta against X. In short, I don't think it's worth considering unreachable objects in a repository to be considered secret. > > Why do we see a slowdown when there there aren't any hidden references? > > Or am I misunderstanding your patch message which instead means "we see > > a slow-down when there are no hidden references [since we still must > > store and enumerate all advertised references]"? > > I have tried to dig down into the code of `revision.c` but ultimately > returned empty-handed. I _think_ that this is because of the different > paths we use when reading revisions from stdin as we have to resolve the > revision to an OID first, which is more involved than taking the OIDs as > returned by the reference backend. I have tried to short-circuit this > logic in case the revision read from stdin is exactly `hash_algo->hexsz` > long so that we try to parse it as an OID directly instead of trying to > do any of the magic that is required to resolve a revision. But this > only speed things up by a small margin. > > Another assumption was that this is overhead caused by using stdin > instead of reading data from a file, but flame graphs didn't support > this theory, either. Certainly read_revisions_from_stdin() will allocate a bunch of extra data per item it reads (via add_rev_cmdline() and add_pending), which is going to be way more than parsing or stdin overhead. Much worse is that it will call get_reference(), which is very keen to actually open and parse the object (as you know, since you added commit-graph handling there). That might have gotten a bit better in v2.38.0 if you have any references to blobs (as we'd now skip the extra hash check). Of course the original "rev-list --not --all" would suffer from the same thing, so some of that may not explain any difference between the two. -Peff