> On 25 Aug 2016, at 20:46, Stefan Beller <sbeller@xxxxxxxxxx> wrote: > >> On Thu, Aug 25, 2016 at 4:07 AM, <larsxschneider@xxxxxxxxx> wrote: >> From: Lars Schneider <larsxschneider@xxxxxxxxx> >> >> packet_write_stream_with_flush_from_fd() and >> packet_write_stream_with_flush_from_buf() write a stream of packets. All >> content packets use the maximal packet size except for the last one. >> After the last content packet a `flush` control packet is written. >> >> packet_read_till_flush() reads arbitrary sized packets until it detects >> a `flush` packet. > > So the API provided by these read/write functions is intended > to move a huge chunks of data. And as it puts the data on the wire one > packet after the other without the possibility to intervene and e.g. send > a side channel progress bar update, I would question the design of this. > If I understand correctly this will be specifically used for large > files locally, > so e.g. a file of 5 GB (such as a virtual machine tracked in Git), would > require about 80k packets. Peff suggested this approach arguing that the overhead is neglectable: http://public-inbox.org/git/20160720134916.GB19359@xxxxxxxxxxxxxxxxxxxxx/ > Instead of having many packets of max length and then a remainder, > I would suggest to invent larger packets for this use case. Then we can > just send one packet instead. > > Currently a packet consists of 4 bytes indicating the length in hex > and then the payload of length-4 bytes. As the length is in hex > the characters in the first 4 bytes are [0-9a-f], we can easily add another > meaning for the length, e.g.: > > A packet starts with the overall length and then the payload. > If the first character of the length is 'v' the length is encoded as a > variable length quantity[1]. The high bit of the char indicates if > the next char is still part of the length field. The length must not exceed > LLONG_MAX (which results in a payload of 9223 Petabyte, so > enough for the foreseeable future). Eventually I would like to resurrect Joey's cleanFromFile/smudgeToFile idea: http://public-inbox.org/git/1468277112-9909-3-git-send-email-joeyh@xxxxxxxxxx/ Then we would not need to transfer that much data over the pipes. However, I wonder if the large amount of packets would actually be a problem. Honestly, I would prefer to not change Git's packet format in this already large series ;-) > [1] A variable-length quantity (VLQ) is a universal code that uses > an arbitrary number of bytes to represent an arbitrarily large integer. > https://en.wikipedia.org/wiki/Variable-length_quantity > > The neat thing about the packet system is we can dedicate packets > to different channels (such as the side channels), but with the provided > API here this makes it impossible to later add in these side channel > as it is a pure streaming API now. So let's remove the complication > of having to send multiple packets and just go with one large packet > instead. I tried to design the protocol as flexible as possible for the future with a version negotiation and a capabilities list. Therefore, I would think it should be possible to implement these ideas in the future if they are required. > -- > I understand that my proposal would require writing code again, > but it has also some long term advantages in the networking stack > of Git: There are some worries that a capabilities line in fetch/push > might overflow in the far future, when there are lots of capabilities. > > Also a few days ago there was a proposal to add all symbolic refs > to a capabilities line, which Peff shot down as "the packet may be > too small". > > There is an incredible hack that allows transporting refs > 64kB IIRC. > > All these things could go away with the variable length encoded > packets. But to make them go away in the future we would need > to start with these variable length packets today. ;) > > Just food for thought. Thanks for thinking it through that thoroughly! I understand your point of view and I am curious what others thing. Cheers, Lars -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html