W dniu 2016-07-29 o 19:35, Junio C Hamano pisze: > Lars Schneider <larsxschneider@xxxxxxxxx> writes: > >> I think sending it upfront is nice for buffer allocations of big files >> and it doesn't cost us anything to do it. > > While I do NOT think "total size upfront" MUST BE avoided at all costs, > I do not think the above statement to justify it makes ANY sense. > > Big files are by definition something you cannot afford to hold its > entirety in core, so you do not want to be told that you'd be fed 40GB > and ask xmalloc to allocate that much. I don't know much how filter driver work internally, but in some cases Git reads or writes from file (file descriptor), in other cases it reads or writes from str+len pair (it probably predates strbuf) - I think in those cases file needs to fit in memory (in size_t). So in some cases Git reads file into memory. Whether it uses xmalloc or mmap, I don't know. > > It allows the reader to be lazy for buffer allocations as long as > you know the file fits in-core, at the cost of forcing the writer to > somehow come up with the total number of bytes even before sending a > single byte (in other words, if the writer cannot produce and hold > the data in-core, it may even have to spool the data in a temporary > file only to count, and then play it back after showing the total > size). For some types of filters you can know the size upfront: - for filters such as rot13, with 1-to-1 transformation, you know that the output size is the same as the input size - for block encodings, and for constant-width to constant-width encoding conversion, filter can calculate output size from the input size (e.g. <output size> = 2*<input size>) - filter may have get size from somewhere, for example LFS filter stub is constant size, and files are stored in artifactory with their length > > It is good that you allow both mode of operations and the size of > the data can either be given upfront (which allows a single fixed > allocation upfront without realloc, as long as the data fits in > core), or be left "(atend)". I think the protocol should be either: <size> + <contents>, or <size unknown> + <contents> + <flush>, that is do not use flush packet if size is known upfront -- it would be a second point of truth (SPOT principle). > I just don't want to see it oversold as a "feature" that the size > has to come before data. That is a limitation, not a feature. > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html