Re: Is there a --stat or --numstat like option that'll allow me to have my cake and eat it too?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 08, 2016 at 04:08:21PM +0100, Ævar Arnfjörð Bjarmason wrote:

> What I really want is something for git-log more like
> git-for-each-ref, so I could emit the following info for each file
> being modified delimited by some binary marker:
> 
>     - file name before
>     - file name after
>     - is rename?
>     - is binary?
>     - size in bytes before
>     - size it bytes after
>     - removed lines
>     - added lines

If you get the full sha1s of each object (e.g., by adding --raw), then
you can dump them all to a single cat-file invocation to efficiently get
the sizes.

I'm not quite sure I understand why you want to know about renames and
added/removed lines if you are just blocking binary files. If I were
implementing this[1], I'd probably just block based on blob size, which
you can do with:

  git rev-list --objects $old..$new |
  git cat-file --batch-check='%(objectsize) %(objectname) %(rest)' |
  perl -alne 'print if $F[0] > 1_000_000; # or whatever' |
  while read size sha1 file; do
	echo "Whoops, $file ($sha1) is too big"
	exit 1
  done

You can also use %(objectsize:disk) to get the on-disk size (which can
tell you about things that don't compress well, which tend to be the
sorts of things you are trying to keep out).

You can't ask about binary-ness, but I don't think it would unreasonable
for cat-file to have a "would git consider this content binary?"
placeholder for --batch-check.

The other things are properties of the comparison, not of individual
objects, so you'll have to get them from "git log". But with some clever
scripting, I think you could feed those sha1s (or $commit:$path
specifiers) into a single cat-file invocation to get the before/after
sizes.

-Peff

[1] GitHub has hard and soft limits for various blob sizes, and at one
    point the implementation looked very similar to what I showed here.
    The downside is that for a large push, the rev-list can actually
    take a fair bit of time (e.g., consider pushing up all of the kernel
    history to a brand new repo), and this is on top of the similar work
    already done by index-pack and check_everything_connected().

    These days I have a hacky patch to notice the too-big size directly
    in index-pack, which is essentially free. It doesn't know about the
    file path, so we pull that out later in the pre-receive hook. But we
    only have to do so in the uncommon case that there _is_ actually a
    too-big file, so normal pushes incur no penalty.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]