Re: Storing additional information in commit headers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 01, 2011 at 11:11:04PM +0200, martin f krafft wrote:

> >   1. Does git actually care about your data? E.g., would it want to use
> >      it for reachability analysis in git-fsck?
> > 
> >   2. Is it an immutable property of a commit, or can it be changed after
> >      the fact?
> 
> Excellent points, and I have answers to both:
> 
>   1. Ideally, I would like to point to another blob containing
>      information. Right now, in order to prevent gc from pruning
>      it, that would have to be a commit pointed to with a parent
>      pointer, which is just not right (it's not a parent) and causes
>      the commit to show up in the history (which it should not, as
>      it's an implementation detail).

In that case, notes sound like a nice solution, as that is exactly what
they do. Yes, they are mutable, but that might not be that big a deal.

>   2. It is immutable. Ideally, I would like to store extra
>      information for a ref in ref/heads/*, but there seems to be no
>      way of doing this. Hence, I need to store it in commits and
>      backtrack for it. Or so I think, at least…

Wait, so you want metadata on a _ref_, not on a commit? That is a very
different thing, I think. We usually accomplish that with data in
.git/config. Or if you need to push data between repos, or if it's too
big to easily fit in the config, then put it in a blob and keep a
parallel ref structure (e.g., refs/topgit/bases/refs/heads/master).

Or maybe I'm just misunderstanding.

> > Otherwise, if (1) is yes, then a commit header makes sense. But
> > then, it should also be something that git is taught about, and
> > your commit header should not be some topgit-specific thing, but
> > a header showing the generalized form.
> 
> I agree entirely and would be all too excited to see this happening.
> I already had ideas too:
> 
>   In addition to the standard tree and parent pointers, there could
>   be *-ref and x-*-ref headers, which take a single ref argument,
>   presumably to a blob containing more data.

I'm not sure how well-defined that is, though. What does the ref mean?
What does it point to, and what is the meaning with respect to the
original commit? Or are you suggesting that "*" would be "topgit-base"
here, and that git core would understand only that any header matching
the pattern "x-*-ref" should be followed with respect to
reachability/pruning. Only the owner of the "*" part (topgit in this
case) would be able to make sense of the meaning of the ref.

If that is the case, that does make sense to me. It's basically an
immutable version of a note.

However, implementing such a thing would mean you have an awkward
transition period where some versions of git think the referenced object
is relevant, and others do not. That's something we can overcome, but
it's going to require code in git, and possibly a dormant introduction
period.

I suspect you would give git people more warm fuzzies about implementing
this by showing a system that is built on git-notes and saying "this
works really well, except that the external note storage is not a good
reason because { it's mutable, it's not efficient, whatever other reason
you find}". And then we know that the system is proven to work, and that
migrating the note-like structure into the object is sensible.

But I get the impression you're one step back from that now. So it makes
sense to me to at least prototype it via git-notes, which will give you
the same semantic storage (a mapping of commits to some blobs, with
reachability handled automatically).

> > Otherwise, the usual recommendation is to use a pseudo-header
> > within the body of the commit message (i.e., "Topgit-Base: ..." at
> > the end of the commit message). The upside is that it's easy to
> > create, manipulate, and examine using existing git tools. The
> > downside is that it is something that the user is more likely to
> > see in "git log" or when editing a rebased commit message.
> 
> … to see *and to accidentally mess up*. And while that may even be
> unlikely, it does expose information that really ought to be hidden.

I'm not quite sure what the information is, so I can't really judge. Do
you have a concrete example?

I got the impression earlier you were wanting to store a human-readable
text string.  That makes a pseudo-header a reasonable choice. But if you
are going to reference some blob (which it seems from what you wrote
above), and you are interested in proper reachability analysis, then no,
it probably isn't a good idea.

> I can see how it's arguable too why one would want to give git
> commit objects the ability to reference arbitrary blobs containing
> additional information. I suppose the answer to this question is
> related to the answer to the question of whether Git is
> a contained/complete tool as-is, or also serves as
> a "framework"/"toolkit" for advanced/creative use.
> 
> The availability of the porcelain commands seems to suggest that
> extensible/flexible additional features should be welcome! ;)

I think extensibility is welcome. It's just that most discussions so far
have ended up realizing that a new header would just be cruft. Maybe
yours is different. I'm still not 100% sure I understand what you want
to accomplish, but the idea of an x-*-ref header is a reasonable thing
for git to have.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]