On Tuesday, May 22, 2012 05:23:17 pm Junio C Hamano wrote: > Martin Fick <mfick@xxxxxxxxxxxxxx> writes: > > I think we can ignore that 400 part from the above for > now. As they are independent "projects", it is > unreasonable to expect that any solution would add costs > less than linearly when you add more projects. As far as git is concerned, sure they are independent, but the projects aren't really independent. Separate repos were likely selected for scalibility reasons in the first place, but this ends up being a "scale fail" pattern. I don't expect git to fix this * 400 issue, but I want to highlight, how difficult it is to make things scale in these cases and that this problems is a natural multiplier when you split something into separate git repos because your tags tend to now be multiplied. So while 10K tags sound like a lot but manageable for 1 repo, it ends up actually being 4M tags if you split your repos. ... > I think the assumption in your build-bot scenario needs > to be a bit more clarified to give context. I am > assuming, when you say "see if projects are up to date", > you mean your build-bot is interested in a single branch > in each repository, possibly together with new tags that > have become reachable from the updated tip of the branch > (if the branch tip moved). Just the branch generally. > I also assume that the > build-bot polls very often and that is why 200MB (or > divided by 400 it would be 0.5MB, which still is a lot > when you talk about a single repository) hurts a lot. To clarify for others (since you likely know), we serialize our submits, so the "poll" is before every build to ensure that we are up to date with what just got merged. In our case we use Gerrit with slaves (read only Gerrit copies), so we offload the "update" to the slave, which incurs the same hit, all though it is not quite a No Op but close (there will have been a batch of commits added to a few projects). But then since the slave may be slightly behind the master (replication from the master to the slave takes time, partially because of the same ref advertisement problem), so we want to verify the slave update by polling the master, which if replication was fast enough would be a No Op. So, our attempts to scale by using slaves does not help very much since the real update and the No Op are almost identical in their impacts... There are workarounds, but mostly hacks. ... > Unlike the capability exchange that happens after the > initial connection, there is no gap in the protocol for > the side that initiates the connection to tell the > serving side to skip/delay the initial ref > advertisement. Crazy stupid suggestion: could the URI somehow be exploited to signal the option to disable advertisements? What if the path part of the URI started with something like several slashes? Older servers just strip the extra slashes right? Newer servers could disable advertisements if they see the slashes. Of course, this gets ugly if scripts add slashes and expect them to be stripped (old clients talking to new servers would probably then fail), -Martin -- Employee of Qualcomm Innovation Center, Inc. which is a member of Code Aurora Forum -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html