[PATCH 0/6] receive-pack: quarantine pushed objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've mentioned before on the list that GitHub "quarantines" objects
while the pre-receive hook runs. Here are the patches to implement
that.

The basic problem is that as-is, index-pack admits pushed objects into
the main object database immediately, before the pre-receive hook runs.
It _has_ to, since the hook needs to be able to actually look at the
objects. However, this means that if the pre-receive hook rejects the
push, we still end up with the objects in the repository. We can't just
delete them as temporary files, because we don't know what other
processes might have started referencing them.

The solution here is to push into a "quarantine" directory that is
accessible only to pre-receive, check_connected(), etc, and only
move the objects into the main object database after we've finished
those basic checks.

One of the things we use it for at GitHub is object-size policy, which
we implement via a pre-receive hook (sort of; see below). This scheme
has been in use for about 2 years, though I did do a fair bit of
tweaking to make it ready for upstream (squashing bugfixes and merges
from upstream that came later, along with polishing a few rough edges I
saw while doing so). So I may have introduced new bugs. :)

The patches are:

  [1/6]: check_connected: accept an env argument
  [2/6]: sha1_file: always allow relative paths to alternates

    These two are preparatory.

  [3/6]: tmp-objdir: introduce API for temporary object directories
  [4/6]: receive-pack: quarantine objects until pre-receive accepts

    This is the interesting part.

  [5/6]: tmp-objdir: put quarantine information in the environment
  [6/6]: tmp-objdir: do not migrate files starting with '.'

    These are two changes that I ended up doing later to support another
    series. They're not strictly necessary here, but I think they're
    worth including now, as they change the visible behavior in minor
    ways. It seems like a good idea to start with what I think should be
    the final behavior.

    The other series is basically an optimization for the object-size
    policy. Without it, you are stuck walking the graph again in the
    pre-receive hook to find the new objects and check their sizes.

    But index-pack can do that for you very cheaply; it has the size of
    each object already. But it _doesn't_ produce nice error messages;
    it has no idea at what path the objects are found, and it doesn't
    know what kind of advice it should give the user.

    So what we can do is ask index-pack to make a note of any objects
    larger than N bytes, and write their sha1 and size into a file in
    the quarantine path. Then the pre-receive hook can look in that log
    and generate any nice message it wants. In the common case, the log
    is empty, and it does not have to do any work at all.

    These two patches set that up by letting index-pack and pre-receive
    know that quarantine path and use it to store arbitrary files that
    _don't_ get migrated to the main object database (i.e., the log file
    mentioned above).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]