On Tue, Aug 1, 2017 at 10:38 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: >> Peff and I discussed off-list whether the lookup-by-SHA-1 feature is >> so important in the first place. Currently, all references must be >> scanned for the advertisement anyway, > > Not really. You can hide refs and allow-tip-sha1 so clients can fetch > a ref even if it wasn't in the advertisement. We really want to use > that wire protocol capability with Gerrit Code Review to hide the > refs/changes/ namespace from the advertisement, but allow clients to > fetch any of those refs if they send its current SHA-1 in a want line > anyway. > > So a server could scan only the refs/{heads,tags}/ prefixes for the > advertisement, and then leverage the lookup-by-SHA1 to verify other > SHA-1s sent by the client. > >> so avoiding a second scan to vet >> SHA-1s received from the client is at best going to reduce the effort >> by a constant factor. Do you have numbers showing that this >> optimization is worth it? > > No, but I don't think I need to do much to prove it. My 866k ref > example advertisement right now is >62 MiB. If we do what I'm > suggesting in the paragraphs above, the advertisement is ~51 KiB. That being said, our bias towards minimizing the number of ref scans is rooted in our experience where scanning 866k refs takes 5 seconds to get the response from the storage backend into the git server. Cutting ref scans from 2 to 1 (or 1 to 0) is a big deal in that case. But that 5s number is based on our current, slow storage, not on reftable. If migrating to reftable turns each 5s scan into a 400ms scan, we might be able to live with that, even if we don't have fast lookup by SHA-1. >> OTOH a mythical protocol v2 might reduce the need to scan the >> references for advertisement, so maybe this optimization will be more >> helpful in the future? I haven't been following the status of the proposal, but I was assuming a client-speaks-first protocol would also imply the client asking for refnames, not SHA-1s, in which case lookup by SHA-1 is no longer relevant.