> On 30 Jul 2016, at 01:11, Jakub Narębski <jnareb@xxxxxxxxx> wrote: > > W dniu 2016-07-29 o 19:35, Junio C Hamano pisze: >> Lars Schneider <larsxschneider@xxxxxxxxx> writes: >> >>> I think sending it upfront is nice for buffer allocations of big files >>> and it doesn't cost us anything to do it. >> >> While I do NOT think "total size upfront" MUST BE avoided at all costs, >> I do not think the above statement to justify it makes ANY sense. >> >> Big files are by definition something you cannot afford to hold its >> entirety in core, so you do not want to be told that you'd be fed 40GB >> and ask xmalloc to allocate that much. > > I don't know much how filter driver work internally, but in some cases > Git reads or writes from file (file descriptor), in other cases it reads > or writes from str+len pair (it probably predates strbuf) - I think in > those cases file needs to fit in memory (in size_t). So in some cases > Git reads file into memory. Whether it uses xmalloc or mmap, I don't > know. > >> >> It allows the reader to be lazy for buffer allocations as long as >> you know the file fits in-core, at the cost of forcing the writer to >> somehow come up with the total number of bytes even before sending a >> single byte (in other words, if the writer cannot produce and hold >> the data in-core, it may even have to spool the data in a temporary >> file only to count, and then play it back after showing the total >> size). > > For some types of filters you can know the size upfront: > - for filters such as rot13, with 1-to-1 transformation, you know > that the output size is the same as the input size > - for block encodings, and for constant-width to constant-width > encoding conversion, filter can calculate output size from the > input size (e.g. <output size> = 2*<input size>) > - filter may have get size from somewhere, for example LFS filter > stub is constant size, and files are stored in artifactory with > their length > >> >> It is good that you allow both mode of operations and the size of >> the data can either be given upfront (which allows a single fixed >> allocation upfront without realloc, as long as the data fits in >> core), or be left "(atend)". > > I think the protocol should be either: <size> + <contents>, or > <size unknown> + <contents> + <flush>, that is do not use flush > packet if size is known upfront -- it would be a second point > of truth (SPOT principle). As I mentioned elsewhere a <flush> packet is always send right now. I have no strong opinion if this is good or bad. The implementation was a little bit simpler and that's why I did it. I will implement whatever option the majority prefers :-) Cheers, Lars > >> I just don't want to see it oversold as a "feature" that the size >> has to come before data. That is a limitation, not a feature. >> >> Thanks. >> > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html