On Wed, Mar 02, 2016 at 12:31:16AM -0800, Junio C Hamano wrote: > Josh Triplett <josh@xxxxxxxxxxxxxxxx> writes: > > I think several simpler optimizations seem > > preferable, such as binary object names, and abbreviating complete > > object sets ("I have these commits/trees and everything they need > > recursively; I also have this stack of random objects."). > > Given the way pack stream is organized (i.e. commits first and then > trees and blobs that belong to the same delta chain together), and > our assumed goal being to salvage objects from an interrupted > transfer of a packfile, you are unlikely to ever see "I have these > commits/trees and everything they need" that are salvaged from such > a failed transfer. So I doubt such an optimization is worth doing. True for the resumable clone case. For that optimization, I was thinking of the "pull during the merge window" case that Al Viro was also interested in optimizing. > Besides it is very expensive to compute (the computation is done on > the client side, so the cycles burned and the time the user has to > wait is of much less concern, though); you'd essentially be doing > "git fsck" to find the "dangling" objects. Trading client-side computation for bandwidth can potentially be worthwhile if you have plenty of local compute but a slow and metered link. > The list of what would be transferred needs to come in full from the > server end, as the list names objects that the receiving end may not > have seen, but the response by the client could be encoded much > tightly. For the full list of N objects from the server, we can > think of your response to be a bitstream of N bits, each on-bit in > which signals an unwanted object in the list. You can optimize this > transfer by RLE compressing the bitstream, for example. > > As git-over-HTTP is stateless, however, you cannot assume that the > server side remembers what it sent to the client (instead, the > client side needs to re-post what it heard from the server in the > previous exchange to allow the server side to use it after > validating). So "objects at these indices in your list" kind of > optimization may not work very well in that environment. I'd > imagine that an exchange of "Here are the list of objects", "Give me > these objects" done naively in full 40-hex object names would work > OK there, though. Good point. Between statelessness and Duy's point about the client list usually being smaller than the server list, perhaps it would make sense to not have the server send a list at all, and just have the client send its own list. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html