Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: > On Sat, May 12, 2012 at 4:16 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > ... >> It is not automatically "converting"; as far as the notes subsystem is >> concerned, the data you throw at it to be associated with an object the >> note annotates has always been uninterpreted stream of bytes. If an >> application wants to store the raw representation of a commit object >> including the log message and its header, it has every right to expect >> that it can use "git cat-file commit $argument_to_the_C_option" as the >> source of that uninterpreted stream of bytes, doesn't it? > > Some part of git-notes expects this stream of bytes to be textual, > human readable. In that case, "git notes add -C commit/tag"'s stuffing > a blob with the given commit/tag content to notes tree may make sense. > Personally I'd rather stuff the commit/tag object instead so no new > blobs are created. The end result is the same: read_sha1_file() always > return correct text, so does "git notes show". No, the end result is definitely not the same. There are two important characteristics of "uninterpreted byte stream" the above thinking is not taking into consideration: (1) we do not interpret what the application stores; and (2) the application is *not* limited by our type system. Suppose the application happens to want to stuff the contents it took from a commit object, and "add -C $objecname" is a convenient way to do so. We have recorded it as "blob" because it is uninterpreted stream of bytes. If you record that as a leaf note in the note tree, does that mean the note tree now suddenly have a submodule? Hell, no. What if the application wanted to record the contents of a tree object instead? How would that affect the fan-out mechanism the note subsystem uses to hash the 40-hexadecimal object names? After descending the notes tree to consume the object name to reach the leaf node, it still finds even more level hanging below. Not very careful "list all object names that have notes attached in this note tree" implementation may end up descending into the tree object, because of this patch. Another bad implication of such a change is that suddenly we would start including all the subobjects in that tree object when computing the reachability of the notes tree. The application needs to have a way to tell what is in the data it stores anyway, because it is not necessarily "enhancing git" kind of application that talks about relationships between git objects. And it may do so either by convention (e.g. my "notes/amlog" notes tree only records a single-line text that is a Message-Id header of the original patch e-mail commits came from) or by having its own way to identify the application specific data type (e.g. json, pickle, protobuf, etc.). It is pointless to say "Git object types can be stored natively, but there is only one type of blob so the application needs to find a way to coax the types of data it wants to store itself." It needs to do so anyway, and having native and standardized way only for git object types does not improve the system. It only ties our hands going forward because we need to define what it _means_ to store non-blob types in the notes tree, and support that forever. So this 1/4 patch is _not_ a bugfix at all. It breaks perfectly good current storage semantics without no good reason. For that matter, as long as $EDITOR is set to something appropriate for the application specific data, there is no reason to forbid editing, either. The only sensible safety against "oops, I forgot that this notes tree stores binary gunk" I can think of offhand that won't cripple sensible use case is to sample the data to see if it is binary and ask confirmation, similar to how "less" asks "frotz may be a binary file. See it anyway?", and do so only when we are spewing it to the terminal. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html