Re: topological index field for commit objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 30, 2016 at 12:30:31PM +0200, Jakub Narębski wrote:

> > This is one of the open questions. My older patches turned them off when
> > replacements and grafts are in effect.
> 
> Well, if you store the cache of generation numbers in the packfile, or in
> the index of the packfile, or in the bitmap file, or in separate bitmap-like
> file, generating them on repack, then of course any grafts or replacements
> invalidate them... though for low level commands (like object counting)
> replacements are transparent -- or rather they are (and can be) treated as
> any other ref for reachability analysis.
> 
> Well, if there are no grafts, you could still use them for doing
> "git --no-replace-objects log ...", isn't it?

Yes, replace refs don't invalidate the concept of a cache. It just
means that you invalidate the invariants of the cache for a specific
view, so you need a cache which matches that view.

It has been several years, but I remember at one point having patches
that summarized the graft/replace state as a single hash, and only used
the cache if it matched that state. So you could actually keep a cache
for some set of replace-refs that you have, as well as a cache for the
case that you've turned them off, etc.

I don't think that level of complexity is really worth it, though.

> >>> I have patches that generate and store the numbers at pack time, similar
> >>> to the way we do the reachability bitmaps.
> 
> Ah, so those cached generation numbers are generated and stored at pack
> time. Where you store them: is it a separate file? Bitmap file? Packfile?

There were a few iterations of the concept over the years, but the
pack-time one uses a separate file with the same name prefix as a pack
(similar to the way bitmaps are stored). The big advantage there is that
we can piggy-back on the pack .idx to avoid having to write each sha1
again (20 bytes per commit, whereas the actual data we're caching is
only 4 bytes).

> > At GitHub we are using them for --contains analysis, along with mass
> > ahead/behind (e.g., as in https://github.com/gitster/git/branches). My
> > plan is to send patches upstream, but they need some cleanup first.
> 
> That would be nice to have, please.
> 
> Er, is mass ahead/behind something that can be plugged into Git
> (e.g. for "git branch -v -v"), or is it something GitHub-specific?

We have a custom command, "git ahead-behind", where you can specify
arbitrary pairs of commits on stdin. But it's all backed by a function
which, yes, could be plugged into "branch -v -v". It caches any bitmaps
it needs, so if you are doing 100 ahead/behind comparisons against
"master", for example, it only has to find the bitmap for "master" once
(remember that we sometimes have to traverse to complete a bitmap when
a branch has been updated since the last repack).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]