Re: [RFC/PATCH 3/3] receive-pack: allow a maximum input size to be specified

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 15, 2016 at 09:57:29PM +0200, Christian Couder wrote:

> From: Jeff King <peff@xxxxxxxx>
> 
> Receive-pack feeds its input to either index-pack or
> unpack-objects, which will happily accept as many bytes as
> a sender is willing to provide. Let's allow an arbitrary
> cutoff point where we will stop writing bytes to disk.
> 
> What has already been written to disk can be cleaned
> outside of receive-pack.

This second paragraph hints at a related problem.

Git is generally happy to leave tmp_pack_* around to be cleaned up later
next time git-gc runs. Including its default 2-week grace time.

So imagine that tries to "git push" in a loop. And each time they push,
you say "nope, that's too big". And each time you acquire a new 2GB
tmp_pack file. If your goal was to prevent somebody from streaming
straight to your filesystem and filling up your disk, then it wasn't
very successful. :)

The simple fix is to call register_tempfile() in open_pack_file(), and
just have index-pack clean up the file on its way out.

But there are harder cases. For instance, imagine somebody pushes a
500MB file, and you have a pre-receive hook that says "too big; I won't
accept this". And then they push in a loop, as before. You've accepted
the incoming pack into the repository by the time the pre-receive runs.
You can't just delete it, because you don't know if other simultaneous
processes have started to depend on the objects.

To solve that, I have patches that put incoming packfiles into a
"quarantine" area, then run the connectivity check and pre-receive hooks
with the quarantine accessible via GIT_ALTERNATE_OBJECT_DIRECTORIES. And
then we either move the quarantine packs into the real repo, or blow
away the tmpdir, depending on whether the hooks said the objects were
OK.

Those are patches I plan to share upstream but just haven't gotten
around to yet.

> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 92e1213..7627f7f 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -46,6 +46,7 @@ static int transfer_unpack_limit = -1;
>  static int advertise_atomic_push = 1;
>  static int advertise_push_options;
>  static int unpack_limit = 100;
> +static off_t max_input_size;
>  static int report_status;
>  static int use_sideband;
>  static int use_atomic;
> @@ -212,6 +213,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
>  		return 0;
>  	}
>  
> +	if (strcmp(var, "receive.maxsize") == 0) {
> +		max_input_size = git_config_ulong(var, value);
> +		return 0;
> +	}

Another off_t/ulong mismatch. I think you want git_config_int64() here.

> @@ -1650,6 +1656,9 @@ static const char *unpack(int err_fd, struct shallow_info *si)
>  		if (fsck_objects)
>  			argv_array_pushf(&child.args, "--strict%s",
>  				fsck_msg_types.buf);
> +		if (max_input_size)
> +			argv_array_pushf(&child.args, "--max-input-size=%lu",
> +				max_input_size);

And here, PRIuMAX and uintmax_t. Or perhaps simpler, just store the
value as a string here and pass it on to index-pack (which would then
need to learn to handle suffixes like "2g"). We do a similar trick in
repack; see b861e23 (repack: propagate pack-objects options as strings,
2014-01-22).

> diff --git a/t/t5546-push-limits.sh b/t/t5546-push-limits.sh
> new file mode 100755
> index 0000000..d3a4d1a
> --- /dev/null
> +++ b/t/t5546-push-limits.sh
> @@ -0,0 +1,47 @@
> +#!/bin/sh
> +
> +test_description='check input limits for pushing'
> +. ./test-lib.sh
> +
> +test_expect_success 'create known-size commit' '
> +	test-genrandom foo 1024 >file &&
> +	git add file &&
> +	test_commit one-k
> +'
> +
> +test_expect_success 'create remote repository' '
> +	git init --bare dest &&
> +	git --git-dir=dest config receive.unpacklimit 1
> +'

We're going to do basically the same battery of tests against an
unpacklimit of "1" (to catch index-pack) and of "10" (to catch
unpack-objects). It might be clearer to just have a for-loop like:

  for unpacklimit in 1 100
  do
	test_expect_success 'create remote repository' '
		rm -rf dest &&
		git init --bare dest &&
		git -C dest config receive.unpacklimit $unpacklimit
	'

	test_expect_success 'receive.maxsize rejects push' '
		git -C dest config receive.maxsize 512 &&
		test_must_fail git push dest HEAD &&
	'

	test_expect_success 'bumping limit allows push' '
		git -C dest config receive.maxsize 4k &&
		git push dest HEAD
	'
  done

and it's probably worth a comment at the top of the loop explaining what
the heck those numbers mean. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]