Re: [PATCH v2 1/4] notes: preserve object type given by "add -C"

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 11 May 2012 23:12:27 -0700

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:

> On Sat, May 12, 2012 at 4:16 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> ...
>> It is not automatically "converting"; as far as the notes subsystem is
>> concerned, the data you throw at it to be associated with an object the
>> note annotates has always been uninterpreted stream of bytes. If an
>> application wants to store the raw representation of a commit object
>> including the log message and its header, it has every right to expect
>> that it can use "git cat-file commit $argument_to_the_C_option" as the
>> source of that uninterpreted stream of bytes, doesn't it?
>
> Some part of git-notes expects this stream of bytes to be textual,
> human readable. In that case, "git notes add -C commit/tag"'s stuffing
> a blob with the given commit/tag content to notes tree may make sense.
> Personally I'd rather stuff the commit/tag object instead so no new
> blobs are created. The end result is the same: read_sha1_file() always
> return correct text, so does "git notes show".

No, the end result is definitely not the same.

There are two important characteristics of "uninterpreted byte stream" the
above thinking is not taking into consideration:

 (1) we do not interpret what the application stores; and
 (2) the application is *not* limited by our type system.

Suppose the application happens to want to stuff the contents it took from
a commit object, and "add -C $objecname" is a convenient way to do so.  We
have recorded it as "blob" because it is uninterpreted stream of bytes. If
you record that as a leaf note in the note tree, does that mean the note
tree now suddenly have a submodule?  Hell, no.

What if the application wanted to record the contents of a tree object
instead?  How would that affect the fan-out mechanism the note subsystem
uses to hash the 40-hexadecimal object names?  After descending the notes
tree to consume the object name to reach the leaf node, it still finds
even more level hanging below.  Not very careful "list all object names
that have notes attached in this note tree" implementation may end up
descending into the tree object, because of this patch.  Another bad
implication of such a change is that suddenly we would start including all
the subobjects in that tree object when computing the reachability of the
notes tree.

The application needs to have a way to tell what is in the data it stores
anyway, because it is not necessarily "enhancing git" kind of application
that talks about relationships between git objects.  And it may do so
either by convention (e.g. my "notes/amlog" notes tree only records a
single-line text that is a Message-Id header of the original patch e-mail
commits came from) or by having its own way to identify the application
specific data type (e.g. json, pickle, protobuf, etc.).  It is pointless
to say "Git object types can be stored natively, but there is only one
type of blob so the application needs to find a way to coax the types of
data it wants to store itself."  It needs to do so anyway, and having
native and standardized way only for git object types does not improve the
system.  It only ties our hands going forward because we need to define
what it _means_ to store non-blob types in the notes tree, and support
that forever.

So this 1/4 patch is _not_ a bugfix at all.  It breaks perfectly good
current storage semantics without no good reason.

For that matter, as long as $EDITOR is set to something appropriate for
the application specific data, there is no reason to forbid editing,
either.

The only sensible safety against "oops, I forgot that this notes tree
stores binary gunk" I can think of offhand that won't cripple sensible use
case is to sample the data to see if it is binary and ask confirmation,
similar to how "less" asks "frotz may be a binary file. See it anyway?",
and do so only when we are spewing it to the terminal.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html