[PATCH/RFC] Document format of basic Git objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Basic objects' format is pretty simple and (I think) well-known.
However it's good that we document them. At least we can keep track of
the evolution of an object format. The commit object, for example,
over the years has learned "encoding" and recently GPG signing.

This is just a draft text with a bunch of fixmes. But I'd like to hear
from the community if this is a worthy effort. If so, then whether
git-cat-file is a proper place for it. Or maybe we put relevant text
in commit-tree, write-tree and mktag, then refer to them in cat-file
because cat-file can show raw objects.

So comments?

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
---
 PS. This also makes me wonder if tag object supports "encoding".
 Haven't dug down in history yet.

 Documentation/git-cat-file.txt |   40 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 2fb95bb..e3dd6d9 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -100,6 +100,46 @@ for each object specified on stdin that does not exist in the repository:
 <object> SP missing LF
 ------------
 
+OBJECT FORMAT
+-------------
+
+Tree object consists of a series of tree entries sorted in memcmp()
+order by entry name. Each entry consists of:
+
+- POSIX file mode encoded in octal ascii
+- One space character
+- Entry name terminated by one character NUL
+- 20 byte SHA-1 of the entry
+
+Tag object is ascii plain text in a format similar to email format
+(RFC 822). It consists of a header and a body, separated by a blank
+line. The header includes exactly four fields in the following order:
+
+1. "object" field, followed by SHA-1 in ascii of the tagged object
+2. "type" field, followed by the type in ascii of the tagged object
+   (either "commit", "tag", "blob" or "tree" without quotes,
+   case-sensitive)
+3. "tag" field, followed by the tag name
+4. "tagger" field, followed by the <XXX, to be named>
+
+The tag body contains the tag's message and possibly GPG signature.
+
+Commit object is in similar format to tag object. The commit body is
+in plain text of the chosen encoding (by default UTF-8). The commit
+header has the following fields in listed order
+
+1. One "tree" field, followed by the commit's tree's SHA-1 in ascii
+2. Zero, one or more "parent" field
+3. One "author" field, in <XXX to be named> format
+3. One "committer" field, in <XXX to be named> format
+4. Optionally one "encoding" field, followed by the encoding used for
+   commit body
+5. GPG signature (fixme)
+
+More headers after these fields are allowed. Unrecognized header
+fields must be kept untouched if the commit is rewritten. However, a
+compliant Git implementation produces the above header fields only.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
-- 
1.7.8.36.g69ee2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]