Re: [RFC] prune: --expire=seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <junkio@xxxxxxx> wrote:
> Matthias Lederhofer <matled@xxxxxxx> writes:
> 
> > This option specifies the minimum age of an object before it
> > may be removed by prune.  The default value is 2 hours and
> > may be changed using gc.pruneexpire.
> 
> I am not sure if this is needed, as Shawn explained earlier
> rounds of loose-objects safety work.

I think we need this fix.  We still have a race condition between
the loose object creation and the ref update.

We've closed this hole completely in the large push case (objects
>=receive.unpackLimit) and 'fetch -k' case by creating .keep files
before the .pack file, updating refs, then deleting the .keep file;
and by making sure git-repack leaves packs with .keeps alone.  So
we cannot lose an object here.

But update-index/add/merge-recursive/write-tree/commit-tree, etc.
as well as small pushes (objects <receive.unpackLimit) and fetch
without -k option still have a race condition.  The objects will
be created/unpacked into the loose objects directory with nothing
referencing them, and a prune which gets to run just before before
the ref update becomes visible would probably whack those objects.

Given that 'git gc' is the encouraged way to maintain a repository,
and that 'repack -a -d' is safe, and prune-packed is equally safe,
I think we should try to make prune safe too.  Matthias' patch
does this by giving the ref update process a fairly large window
to perform its action within.
 
> If this is something we would want, it might make sense if we
> allowed "prune --expire='1.day'" syntax ;-).

Yes, I agree.

Matthias you can take a look at builtin-reflog.c's argument handling
for an example.  I think you just need to use approxidate() in both
your config function and in your command line argument handling.
Then the default becomes '2.hours.ago' instead of just "2" (at
least from a documentation perspective).

Though the more I think about this perhaps the default should be
'1.day'.  24 hours is a hellva large window for any current ref
update to complete in, even if the ref update was some massive rsync
which is doing a such a large volume of data on a small bandwidth
link that it takes 20 hours to complete.  Besides, users could
always force it to be much lower with the command line option if
they really need to prune _right_now_.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]