On Mon, Aug 25, 2014 at 11:35:45AM -0700, Junio C Hamano wrote: > Steffen Prohaska <prohaska@xxxxxx> writes: > > >> Couldn't we do that with an lseek (or even an mmap with offset 0)? That > >> obviously would not work for non-file inputs, but I think we address > >> that already in index_fd: we push non-seekable things off to index_pipe, > >> where we spool them to memory. > > > > It could be handled that way, but we would be back to the original problem > > that 32-bit git fails for large files. > > Correct, and you are making an incremental improvement so that such > a large blob can be handled _when_ the filters can successfully > munge it back and forth. If we fail due to out of memory when the > filters cannot, that would be the same as without your improvement, > so you are still making progress. I do not think my proposal makes anything worse than Steffen's patch. _If_ you have a non-required filter, and _if_ we can run it, then we stream the filter and hopefully end up with a small enough result to fit into memory. If we cannot run the filter, we are screwed anyway (we follow the regular code path and dump the whole thing into memory; i.e., the same as without this patch series). I think the main argument against going further is just that it is not worth the complexity. Tell people doing reduction filters they need to use "required", and that accomplishes the same thing. > >> So it seems like the ideal strategy would be: > >> > >> 1. If it's seekable, try streaming. If not, fall back to lseek/mmap. > >> > >> 2. If it's not seekable and the filter is required, try streaming. We > >> die anyway if we fail. > > Puzzled... Is it assumed that any content the filters tell us to > use the contents from the db as-is by exiting with non-zero status > will always be large not to fit in-core? For small contents, isn't > this "ideal" strategy a regression? I am not sure what you mean by regression here. We will try to stream more often, but I do not see that as a bad thing. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html