Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 26 Aug 2014 12:32:17 -0700

Jeff King <peff@xxxxxxxx> writes:

> On Mon, Aug 25, 2014 at 11:35:45AM -0700, Junio C Hamano wrote:
>
>> Steffen Prohaska <prohaska@xxxxxx> writes:
>> 
>> >> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
>> >> obviously would not work for non-file inputs, but I think we address
>> >> that already in index_fd: we push non-seekable things off to index_pipe,
>> >> where we spool them to memory.
>> >
>> > It could be handled that way, but we would be back to the original problem
>> > that 32-bit git fails for large files.
>> 
>> Correct, and you are making an incremental improvement so that such
>> a large blob can be handled _when_ the filters can successfully
>> munge it back and forth.  If we fail due to out of memory when the
>> filters cannot, that would be the same as without your improvement,
>> so you are still making progress.
>
> I do not think my proposal makes anything worse than Steffen's patch.

I think we are saying the same thing, but perhaps I didn't phrase it
well.

> I think the main argument against going further is just that it is not
> worth the complexity. Tell people doing reduction filters they need to
> use "required", and that accomplishes the same thing.
>
>> >> So it seems like the ideal strategy would be:
>> >> 
>> >>  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
>> >> 
>> >>  2. If it's not seekable and the filter is required, try streaming. We
>> >>     die anyway if we fail.
>> 
>> Puzzled...  Is it assumed that any content the filters tell us to
>> use the contents from the db as-is by exiting with non-zero status
>> will always be large not to fit in-core?  For small contents, isn't
>> this "ideal" strategy a regression?
>
> I am not sure what you mean by regression here. We will try to stream
> more often, but I do not see that as a bad thing.

I thought the proposed flow I was commenting on was

    - try streaming and die if the filter fails

For an optional filter working on contents that would fit in core,
we currently do

    - slurp in memory, filter it, use the original if the filter fails

If we switched to 2., then... ahh, ok, I misread "is required" part.
The "regression" does not apply to that case at all.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html