Re: [PATCH v2 5/5] convert: add filter.<driver>.process option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 27, 2016 at 02:06:05AM +0200, larsxschneider@xxxxxxxxx wrote:

> +static off_t multi_packet_read(struct strbuf *sb, const int fd, const size_t size)
> +{
> +	off_t bytes_read;
> +	off_t total_bytes_read = 0;

I haven't looked carefully at the whole patch yet, but there seems to be
some type issues here. off_t is a good type for storing the whole size
of a file (which may be larger than the amount of memory we can
allocate). But size_t is the right size for an in-memory object.

This function takes a size_t size, which makes sense if it is meant to
read everything into a strbuf.

So I think our total_bytes_read would probably want to be a size_t here,
too, because it cannot possibly grow larger than that (and that is
enforced by the loop below). Otherwise you get weirdness like "sb->buf +
total_bytes_ref" possibly overflowing memory.

> +	strbuf_grow(sb, size + 1);	// we need one extra byte for the packet flush

What happens if size is the maximum for size_t here (i.e., 4GB-1 on a
32-bit system)?

> +	do {
> +		bytes_read = packet_read(
> +			fd, NULL, NULL,
> +			sb->buf + total_bytes_read, sb->len - total_bytes_read - 1,
> +			PACKET_READ_GENTLE_ON_EOF
> +		);

packet_read() actually returns an int, and may return "-1" on EOF (and
int is fine because we know that we are constrained to 16-bit values
by the pkt-line definition). You read it into an "off_t". I _think_ that
is OK, because I believe POSIX says off_t must be signed. But probably
"int" is the more correct type here.

> +		total_bytes_read += bytes_read;

If you do get "-1", I think you need to detect it here before adjusting
total_bytes_read.

> +	while (
> +		bytes_read > 0 && 					// the last packet was no flush
> +		sb->len - total_bytes_read - 1 > 0 	// we still have space left in the buffer
> +	);

And I'm not sure if you need to distinguish between "0" and "-1" when
checking byte_read here.

> +	strbuf_setlen(sb, total_bytes_read);

Passing an off_t to something expecting a size_t, which can involve
truncation (though I think in practice you really are limited to
size_t).

> +static int multi_packet_write(const char *src, size_t len, const int in, const int out)
> +{
> +	int ret = 1;
> +	char header[4];
> +	char buffer[8192];
> +	off_t bytes_to_write;
> +	while (ret) {
> +		if (in >= 0) {
> +			bytes_to_write = xread(in, buffer, sizeof(buffer));

Likewise here, xread() is returning ssize_t. Again, OK if we can assume
off_t is signed, but it probably makes sense to use the correct type (we
also know it cannot be larger than 8K, of course).

Why 8K? The pkt-line format naturally restricts us to just under 64K, so
why not take advantage of that and minimize the framing overhead for
large data?

> +			if (bytes_to_write < 0)
> +				ret &= 0;

I think "&= 0" is unusual for our codebase? Would just writing "= 0" be
more clear?

We do sometimes do "ret |= something()" but that is in cases where
"ret" is zero for success, and non-zero (usually -1) otherwise. Perhaps
your function's error-reporting is inverted from our usual style?

> +		set_packet_header(header, bytes_to_write + 4);
> +		ret &= write_in_full(out, &header, sizeof(header)) == sizeof(header);
> +		ret &= write_in_full(out, src, bytes_to_write) == bytes_to_write;
> +	}

If you look at format_packet(), it pulls a slight trick: we have a
buffer 4 bytes larger than we need, format into "buf + 4", and then
write the final size at the beginning. That lets us write() it all in
one go.

At first I thought this function was simply reinventing packet_write(),
but I guess you are trying to avoid the extra copy of the data (once
into the buffer from xread, and then again via format_packet just to add
the extra bytes at the beginning).

I agree with what Junio said elsewhere, that there may be a way to make
the pkt-line code handle this zero-copy situation better. Perhaps
something like:

  struct pktline {
	/* first 4 bytes are reserved for length header */
	char buf[LARGE_PACKET_MAX];
  };
  #define PKTLINE_DATA_START(pkt) ((pkt)->buf + 4)
  #define PKTLINE_DATA_LEN (LARGE_PACKET_MAX - 4)

  ...
  struct pktline pkt;
  ssize_t len = xread(fd, PKTLINE_DATA_START(&pkt), PKTLINE_DATA_LEN);
  packet_send(&pkt, len);

Then packet_send() knows that the first 4 bytes are reserved for it. I
suspect that the strbuf used by format_packet() could get away with
using such a "struct pktline" too, though in practice I doubt there's
any real efficiency to be gained (we generally reuse the same strbuf
over and over, so it will grow once to 64K and get reused).

> +	ret &= write_in_full(out, "0000", 4) == 4;

packet_flush() ?

I know the packet functions are keen on write_or_die() versus
write_in_full().  That is perhaps something that should be fixed.

This was just supposed to be a short note about off_t before eating
dinner (oops), so I didn't read past here.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]