brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> writes: > On 2024-03-14 at 23:29:40, Junio C Hamano wrote: > > "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: > > > As a server operator, I also expect that there > > > are people doing lots of needless attempts at pushing in automated > > > systems (because with enough users, there will be at least some who do > > > bizarre or inefficient things), and I would prefer to avoid serving > > > those requests if I don't need to. (For example, for us, reference > > > updates need to go through a distributed commit protocol to update > > > multiple replicas of the repository, and if there's no ref updates, then > > > we cut out multiple services which we don't need to contact.) > > > > Yes, but if you have an extra no-op ref update in the bunch, that > > one is excluded from the set of changes to be synchronised across > > replicas, no? > > > As I said, we can't do that, because we have to verify that the old > value is correct. Even if we did perform that optimization, once we > know we have a write, we have to contact every replica (in case the user > does indeed send us a pack, which is allowed by the protocol, even if > bizarre), whereas with a ref request or a read, we can satisfy that with > a single read replica. I'm sure I can satisfy at least one to two > orders of magnitude more no-op push attempts with the current > optimization than I could if we didn't have it. It might even be to the > point of three or more orders of magnitude. (My guess is that reftable > will improve this even further, but I haven't yet measured.) > > > I'm sure this is also true for most other implementations; serving > reads, like ls-refs, are almost always substantially cheaper than writes > and most server implementations are highly optimized for serving large > numbers of reads due to CI farms and such. > > > > > Note also that no-op ref updates cannot be simply omitted on the server > > > side because we need to verify that the old value for the ref is > > > correct, or we need to reject the operation as out of date. > > > > Yes, but isn't that something the user would rather want to find out > > earlier rather than later? Your push without no-op update may say > > "ah, we are up to date" when we are behind or worse diverged. If we > > do the no-op update more often, we'd have more chance to catch such > > a situation where the worldview the end-user's repository has is out > > of sync with reality. > > > Well, we were up to date at the time the ls-refs operation completed, or > the push wouldn't have the same old and new OIDs. The client has to > declare the old ref value that it got from the ls-refs output in order > for the operation to succeed, since the server-side Git won't update the > references if the old value doesn't match. (This is how the client > knows whether a push is a fast-forward or not: it computes it, and if > the remote side has an unknown value, then it's not.) Other than > updating the push options, there's no functional benefit to making a > second request, since the ls-refs output already told us where the > branch is. That's why the current optimization is functionally correct. > > > In any event, there are alternate implementations of the Git protocol > which did not implement our current optimization and did, for purposes > of mirroring, make excessive numbers of no-op requests, and we did have > to ask for that to be fixed. I assure you, the no-op behaviour is > better for users because it's faster and substantially more likely to > succeed (because it is effectively a read-only ls-refs request), whereas > a new ref update is less efficient, slower, and substantially more > likely to fail (due to functional limitations on write throughput). > > > I also expect there will be poorly written `pre-push` and `pre-receive` > hooks which will not like getting a no-op update and not handle it > gracefully. I'm pretty sure Git LFS will handle this just fine, but > lots of people have various shell scripts that probably will not. Is this a potential avenue for DoS? Clearly, I'm not the first to send no-op ref updates, so this change doesn't create the problem, but it could allow non-coders to exploit existing problems much more easily. If this could be a problem, we can take the time now to solve it. A simple limit on the number of no-op refs received before the server terminates with REF_STATUS_REMOTE_REJECT should be sufficient, right? If we bake that expectation into the official Git implementation, then other implementations will be empowered to take similar steps. For at least one service[1], GitLab only supports Push Options on, at most, 10 refs. If other server operators have similar limitations, then this is an easy bud to nip. Regards, Chris. [1] https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/merge_requests/push_options_handler_service.rb?ref_type=heads#L5