> On 27 Jul 2016, at 20:11, Jeff King <peff@xxxxxxxx> wrote: > > On Wed, Jul 27, 2016 at 07:31:26PM +0200, Lars Schneider wrote: > >>>> + strbuf_grow(sb, size + 1); // we need one extra byte for the packet flush >>> >>> What happens if size is the maximum for size_t here (i.e., 4GB-1 on a >>> 32-bit system)? >> >> Would that be an acceptable solution? >> >> if (size + 1 > SIZE_MAX) >> return die("unrepresentable length for filter buffer"); > > No, because by definition "size" will wrap to 0. :) > > You have to do: > > if (size > SIZE_MAX - 1) > die("whoops"); > >> Can you point me to an example in the Git source how this kind of thing should >> be handled? > > The strbuf code itself checks for overflows. So you could do: > > strbuf_grow(sb, size); > ... fill up with size bytes ... > strbuf_addch(sb, ...); /* extra byte for whatever */ > > That does mean _possibly_ making a second allocation just to add the > extra byte, but in practice it's not likely (unless the input exactly > matches the strbuf's growth pattern). > > If you want to do it yourself, I think: > > strbuf_grow(sb, st_add(size, 1)); I like that solution! Thanks! > would work. > >>>> + while ( >>>> + bytes_read > 0 && // the last packet was no flush >>>> + sb->len - total_bytes_read - 1 > 0 // we still have space left in the buffer >>>> + ); >>> >>> And I'm not sure if you need to distinguish between "0" and "-1" when >>> checking byte_read here. >> >> We want to finish reading in both cases, no? > > If we get "-1", that's from an unexpected EOF during the packet_read(), > because you set GENTLE_ON_EOF. So there's nothing left to read, and we > should break and return an error. Right. > I guess "0" would come from a flush packet? Why would the filter send > back a flush packet (unless you were using them to signal end-of-input, > but then why does the filter have to send back the number of bytes ahead > of time?). Sending the bytes ahead of time (if available) might be nice for efficient buffer allocation. I am changing the code so that both cases can be handled (size ahead of time and no size ahead of time). >>> Why 8K? The pkt-line format naturally restricts us to just under 64K, so >>> why not take advantage of that and minimize the framing overhead for >>> large data? >> >> I took inspiration from here for 8K MAX_IO_SIZE: >> https://github.com/git/git/blob/master/copy.c#L6 >> >> Is this read limit correct? Should I read 8 times to fill a pkt-line? > > MAX_IO_SIZE is generally 8 _megabytes_, not 8K. The loop in copy.c just > haad to pick an arbitrary size for doing its read/write proxying. I > think in practice you are not likely to get much benefit from going > beyond 8K or so there, just because operating systems tend to do things > in page-sizes anyway, which are usually 4K. > > But since you are formatting the result into a form that has framing > overhead, anything up to LARGE_PACKET_MAX will see benefits (though > admittedly even 4 bytes per 8K is not much). > > I don't think it's worth the complexity of reading 8 times, but just > using a buffer size of LARGE_PACKET_MAX-4 would be the most efficient. > > I doubt it matters _that much_ in practice, but any time I see a magic > number I have to wonder at the "why". At least basing it on > LARGE_PACKET_MAX has some basis, whereas 8K is largely just made-up. :) Sounds good. I will use LARGE_PACKET_MAX-4 ! > >>> We do sometimes do "ret |= something()" but that is in cases where >>> "ret" is zero for success, and non-zero (usually -1) otherwise. Perhaps >>> your function's error-reporting is inverted from our usual style? >> >> I thought it makes the code easier to read and the filter doesn't care >> at what point the error happens anyways. The filter either succeeds >> or fails. What style would you suggest? > > I think that's orthogonal. I just mean that using zero for success puts > you in our usual style, and then accumulating errors can be done with > "|=". Ah. I guess I was misguided by the way errors are currently handled in `apply_filter` (success = 1; failure = 0): https://github.com/git/git/blob/8c6d1f9807c67532e7fb545a944b064faff0f70b/convert.c#L437-L479 I wouldn't like if the different filter protocols would use different error exit codes. Would it be OK to adjust the existing `apply_filter` function in a cleanup patch? > I didn't look carefully at whether the accumulating style you're using > makes sense or not. But something like: > >>>> + ret &= write_in_full(out, &header, sizeof(header)) == sizeof(header); >>>> + ret &= write_in_full(out, src, bytes_to_write) == bytes_to_write; > > does mean that we call the second write() even if the first one failed. > That's a waste of time (albeit a minor one), but it also means you could > potentially cover up the value of "errno" from the first one (though in > practice I'd expect the second one to fail for the same reason). Oh. You're right. For some reason I thought the second operator would never be evaluated if the first operator is 0. Apparently that is not the case for bit-wise & ... only for logical & ... thanks for the lesson! - Lars-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html