On Sat, May 14, 2011 at 10:50, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Shawn Pearce <spearce@xxxxxxxxxxx> writes: > >> If we are going to put limits in, does it make sense to try and push >> these limits back to pack-objects in a more detailed way? You talked >> about depth of history, or size of pack. pack-objects could >> approximate both. > > I think the receiving end switches between index-pack and unpack-objects > based on the object count, It does... > but if we can add a protocol extension to carry > these estimates from the sender, we would be much better off. A large push > with a small number of objects would want to keep everything in a pack, > for example, but currently there is no way to do so. Yes, my thoughts exactly. A 600 MiB push of 3 objects (1 huge blob, top level tree, commit) shouldn't be unpacked. Serving that huge blob back out to clients that want to fetch it is more efficient when its coming from a pack than from a loose object, due to the object reuse that is available from a pack. Unfortunately we don't have the data necessary to make that decision. Probably the cleanest way to do this estimate transmission is to modify the pack stream format. Introduce a new capability for pack-stats. If advertised from receive-pack, send-pack can pass a --pack-stats flag to pack-objects. That process then writes out a PACK header with version code 4 (rather than 2), and includes additional data beyond the object count in an extended PACK header. receive-pack can then read the PACK header and handle version 2 as it does today, and version 4 by also checking the additional stat fields and dumping what is left over into either unpack-objects or index-pack like it does now. We wouldn't even need to change index-pack or unpack-objects, receive-pack could translate the PACK version 4 header into a version 2 command line flag when it invokes the sub process to handle the stream. Hmm, or maybe this should be pack v5, just so nobody confuses it with the mythical pack v4 we haven't implemented. If we do add this stats header to the PACK stream, we need to be a bit smarter about it. I would lay out the header with an additional 4 byte field telling us the size of the stats block, and then use repeating key-value pairs within the stats block to transmit the data. This way clients can add additional stats to the stream and receive-pack can ignore them if it doesn't understand them. We can start out with number of commits and estimated size of the packed data (based on summing up the reused object sizes) but could add more later like date of oldest commit if we later decide it would help the remote handle the pack stream, or reject it early. As far as receive-pack rejecting the stream after seeing the header, I would like to avoid an extra round-trip between the two peers. So we probably need to modify send-pack like I suggest above to check the stream for an error message after pack-objects terminates, even if pack-objects terminates with a non-zero exit status. Even over a POSIX pipe we should have room for 512 bytes of error data to be crammed into the stream from receive-pack before we deadlock the two peers. If we wanted to be really paranoid, we would run pack-objects in the background and have send-pack constantly monitor the incoming half of the pipe for a status result. Since pack-objects is already another process this shouldn't be that difficult. (But it is harder for me in JGit where its not a separate thread.) -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html