Re: Diagnosing stray/stale .keep files -- explore what is in a pack?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 14, 2014 at 02:42:09PM -0500, Martin Langhoff wrote:

>  On Tue, Jan 14, 2014 at 2:36 PM, Martin Fick <mfick@xxxxxxxxxxxxxx> wrote:
> > Perhaps the receiving process is dying hard and leaving
> > stuff behind?  Out-of-memory, out of disk space?
> 
> Yes, that's my guess as well. This server had gc misconfigured, so it
> hit ENOSPC a few weeks ago.
> 
> It is likely that the .lock files were left behind back then, and
> since then the clients pushing to these refs were transferring their
> whole history and still failing to update the ref, leading to rapid
> repo growth.

We see these occasionally at GitHub, too. I haven't yet figured out a
definite cause, though whatever it is, it's relatively rare.

I think the ".keep" files and the ".lock" files are in two separate
boats, though.

pack-objects creates the .keep files as a "lock" between the time it
moves them into place and when receive-pack updates the refs (so that a
simultaneous prune does not think they should be removed). Receive-pack
then updates the refs and removes the ".keep" file. However, in the
interim code, we are just updating the refs, and are careful to return
any errors rather than calling die() (so if ENOSPC prevented ref write,
that would not cause this). So for us to leave a .keep there, it is
probably one of:

  1. A few generic library functions, like xmalloc, can cause us to die.
     This should be very rare, though.

  2. We tried to unlink the keep-file, but couldn't (could ENOSPC
     prevent a deletion? I suspect it depends on the filesystem).

  3. We were killed by signal (or system crash).

Fetch-pack also will create .keep files, and it is much less careful
during the time the file exists.  However, busy servers tend to be
receiving pushes, not initiating fetches.

Actual ".lock" files are added to a signal/atexit handle that cleans
them up automatically on program exit. So those really should be caused
by system crash (or "kill -9"), and that has generally been our
experience at GitHub. But again, if ENOSPC could prevent deletion on
your filesystem, it could be related. But there is not much git can do
to clean up if unlink() fails us.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]