[PATCH/RFC v2] Document format of basic Git objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Still draft for discussion. Of three people who participated on this
thread, two favor a man page (me and Jonathan), one techincal/
(Junio), so let's put it as a man page for now.

Some notes:

 - I'm tempted to include pack-format.txt because I also document
   loose object format here. If it's included and
   gitrepository-layout.txt links to this, we have a quite complete
   documentation of what's inside $GIT_DIR (assuming rebase-apply and
   such are of private use)

 - Not sure if we fix the order of gpgsig and mergetag, or they can be
   mixed together. Also not sure if we can have multiple gpgsig, I
   haven't checked the code.

 - I skipped the experimental loose object format (it's what it's
   called in sha1_file.c). I think we can call it deprecated and move
   on.

 - Do we assume tag/commit header in utf-8 or ascii?

 - We don't do any encoding on ident strings, right?

Mostly-written-by: Jonathan Nieder <jrnieder@xxxxxxxxx>
Mostly-written-by: Junio C Hamano <gitster@xxxxxxxxx>
Remaining-stolen-from: Documentation/user-manual.txt
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
---
 Documentation/git-object.txt |  273 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 273 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-object.txt

diff --git a/Documentation/git-object.txt b/Documentation/git-object.txt
new file mode 100644
index 0000000..359af37
--- /dev/null
+++ b/Documentation/git-object.txt
@@ -0,0 +1,273 @@
+git-object(5)
+=============
+
+NAME
+----
+git-object - Git object format
+
+SYNOPSIS
+--------
+$GIT_DIR/objects/*
+
+OBJECT ON-DISK REPRESENTATIONS
+------------------------------
+Objects can be stored on disk as loose (unpacked) objects or
+in packs. Loose objects are in $GIT_DIR/objects/[0-9a-f][0-9a-f]
+directories. Packs are $GIT_DIR/objects/pack/pack-*.pack. Each pack
+has a corresponding index file to speed up pack access.
+
+Object SHA-1
+~~~~~~~~~~~~
+An object SHA-1 is calculated on its header and payload. The content
+to be consumed by SHA-1 calculation is:
+
+- Object type in ascii, either "commit", "tree", "tag" or "blob"
+  (without quotes)
+- One space (ASCII SP)
+- The payload length in ascii canonical decimal format
+- ASCII NUL
+- Object payload
+
+Loose objects
+~~~~~~~~~~~~~
+Loose objects are simply a compressed form using zlib(3) of the
+object's header and payload, as described in Object SHA-1 section
+above.
+
+Packed objects
+~~~~~~~~~~~~~~
+FIXME maybe include Documentation/pack-format.txt
+
+BLOB OBJECTS
+------------
+Blob object payload is file data.
+
+TREE OBJECTS
+------------
+Tree object payload contains a list of entries, each with a mode,
+object type, object name, and filename, sorted by filename. It
+represents the contents of a single directory tree.
+
+The object type may be a blob, representing the contents of a file,
+another tree, representing the contents of a subdirectory, or a commit
+(representing a subproject). Since trees and blobs, like all other
+objects, are named by a hash of their contents, two trees have the
+same object name if and only if their contents (including,
+recursively, the contents of all subdirectories) are identical. This
+allows git to quickly determine the differences between two related
+tree objects, since it can ignore any entries with identical object
+names.
+
+Note that the files all have mode 644 or 755: git actually only pays
+attention to the executable bit.
+
+Encoding
+~~~~~~~~
+Entries are of variable length and self-delimiting. Each entry
+consists of
+
+- a POSIX file mode in octal ascii representation, no 0 padding to the
+  left
+- exactly one space (ASCII SP)
+- filename for the entry, as a NUL-terminated string
+- 20-byte binary object name
+
+The mode should be 100755 (executable file), 100644 (regular file),
+120000 (symlink), 40000 (subdirectory), or 160000 (subproject), with
+no leading zeroes. Modes with one leading zero and the synonym 100664
+for 100644 are also accepted for historical reasons. Other modes are
+not accepted.
+
+The filename may be an arbitrary nonempty string of bytes, as long as
+it contains no '/' or NUL character.
+
+The associated object must be a valid blob if the mode indicates a
+file or symlink, tree if it indicates a subdirectory, or commit if it
+indicates a subproject. The blob associated to a symlink entry
+indicates the link target and its content not have any embedded NULs.
+
+Sorting
+~~~~~~~
+Entries are sorted by memcmp(3) on file name. No duplicate file names
+allowed.
+
+COMMIT OBJECT
+-------------
+The commit object links a physical state of a tree with a description
+of how we got there and why. Commit object payload contains the
+associated tree SHA-1, parent commits's SHA-1, author and comitter
+information.
+
+------------------------------------------------
+$ git cat-file commit 81d48f0aee54
+tree 093f37084c133795e4ce71befa57185328737171
+parent f5e4e20faa1eee3feaa0394897bbd1aca544e809
+parent 661db794eb8179c7bea02f159bb691a2fff4a8e0
+parent 14c173eb63432ba5d0783b6c4b23a8fe0c76fb0f
+author Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> 1326576355 -0800
+committer Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> 1326576355 -0800
+mergetag object 661db794eb8179c7bea02f159bb691a2fff4a8e0
+ type commit
+ tag devicetree-for-linus
+ tagger Grant Likely <grant.likely@xxxxxxxxxxxx> 1326520038 -0700
+ 
+ 2nd set of device tree changes for v3.3
+ -----BEGIN PGP SIGNATURE-----
+ Version: GnuPG v1.4.11 (GNU/Linux)
+ 
+ iQIcBAABAgAGBQJPERbzAAoJEEFnBt12D9kBmDIP/R9Vspc6yhjSAEvdp/VET2gi
+ TgAQfdp4VuYjjIt4cUPO5UQU9kw478GjTuP2blZEC9DlG1jSf/L8U+A7FHJIVVzU
+ QfjwV1Lqaqk+sQQ1bsp2ixbesKECmqU9IweOIFmn0U2ZD+xlPFIpE2iTKEqymejf
+ PVZsFlkVmhQZgudPNieyZMjQpQ9hEb6UcSfXT//nmoRRxCL/PiMHGRx3UdS3eRe7
+ FApSW0Mty/PD07QXPsDjg1GvK59Gf6R1/4Bd31+rXEz9yaxf4I4I02fL553NDVIt
+ tAPfo/4YKW1rLMWQRkAUqCaMk9v/DWxeWYbbiJNZ2R3kys9o8k26XXxvcuYnecS2
+ G8DDJpmOikbN3Gvlskh40Tn3TJb5Wlgc7o/10L/fq6FovS4Uk7yUeFMqXUYfl8TU
+ ziIlrlt9IGabXBN4JKJl3OabgkeO+Oz9DKhTQFJLY4/121LAtFVk3xd316mY+wpX
+ mI83VmWMlp3sK+OLr+UdMTCXZvSIpu3KlGKMpAssHKUKxIV20NHLFNbm94/ywXBn
+ Zb8arjcv7+WzwhSqQJj851cq4/sEYx5HB4wU5Nm5SXBwcO3ixiij6lHCoHU+NudR
+ eyPIFLfrzwnUu3yTRgUfAnkgOce+2I+vUsU4pXUR6FyK73wSmm0+4WXQfB+OBlwD
+ 2O1RjZedZCb6zzf17H2k
+ =mup8
+ -----END PGP SIGNATURE-----
+mergetag object 14c173eb63432ba5d0783b6c4b23a8fe0c76fb0f
+ type commit
+ tag spi-for-linus
+ tagger Grant Likely <grant.likely@xxxxxxxxxxxx> 1326520366 -0700
+ 
+ SPI bug fixes for v3.3
+ -----BEGIN PGP SIGNATURE-----
+ Version: GnuPG v1.4.11 (GNU/Linux)
+ 
+ iQIcBAABAgAGBQJPERgyAAoJEEFnBt12D9kBRMsP/RBv6kWIb/qD7yJhrdbzJ4Tv
+ 1f7coSytuHupZVpxJstELKPugRmp2R6YeFbKw8P4P/12233Q0FcdKTF6ZE2h3cBp
+ bfCtyyzlFeY/nMfJKkwh37x2fHxNHynCCJEjHhecLday7NKQoTmmafivTfVmolWK
+ /MGjDarTAzC1FaP1xpBnuiI8eCr5WIgb4WmtvOmxIntVT077xggdJLL/Co7fBCqn
+ iibz3U/VyC68kQTGw6ELhnW1d7doHp7H3DJ2gPsh6lzpbv8JAnOMPpD+3Me1DVHE
+ Ay0kxPHV4bqnDyB+uEGppUiNoaTd5InrMAw+udDad60TMwOZzIvMkgxo0PIVM9Mm
+ k6mCcE2+TSnJetueX3cfrS5bRTPxUX7KRDC/WSp67/QPmelbYeRDLR7hrrQVqOPq
+ 5hIKMfz/kTBXcaXk643TEveaZlMuOZxHBYAvsbu5BX/3SQqYFS4POdxdeZVnUf54
+ ITHhftBtrXacCsjKujp0xmKCIpF+8v3yKRxGEQssByv8v+CaymNrEls2vTF8tn5P
+ sAIjPFJYG+IHtDMIsTHOvSPA7uwWYsOVHFEYsbC1758esiBD8+qtfvFS3jAH99z+
+ v2/aGsfMnjYEIsRtSm7PVTybJAo22Gr62yE/Q+rP//O0JaDahgdm009MjUo6BSgg
+ XNhZjQRYAYEExMTjJ2TK
+ =q39P
+ -----END PGP SIGNATURE-----
+
+Merge tags 'devicetree-for-linus' and 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6
+
+2nd set of device tree changes and SPI bug fixes for v3.3
+
+* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6:
+  of/irq: Add interrupts-names property to name an irq resource
+  of/address: Add reg-names property to name an iomem resource
+
+* tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6:
+  spi/tegra: depend instead of select TEGRA_SYSTEM_DMA
+------------------------------------------------
+
+More precisely, a commit object begins with of one or more lines
+delimited by ASCII LF. The end of the header is signalled by an empty
+line. Any remaining text after the empty line forms the commit
+message. The header must not contain NUL.
+
+A "continuation line" in the header begins with an SP. The remainder
+of the line, after removing that SP, is concatenated to the previous
+line, while retaining the LF at the end of the previous line.
+
+When a line in the header begins with a letter other than SP, and has
+at least one SP in it, it is called a "field". A field consists of the
+"field name", which is the string before the first SP on the line, and
+its "value", which is everything after that SP. When the value
+consists of multiple lines, continuation lines are used.
+
+More than one field with the same name can appear in the header of an
+object, and the order in which they appear is significant. A commit
+object can contain these fields in the listed order:
+
+1. one "tree" field with the 40-character textual object name of the
+   associated tree object
+2. zero or more "parent" fields, each with 40-character textual object
+   name of the parent commit object
+3. one "author" field with an ident string
+4. one "committer" field with an ident string
+5. zero or one "encoding" field with an ascii string
+6. zero or more "mergetag" fields with associated tag object content
+7. zero or one "gpgsig" field with gpg signature content
+
+New kinds of fields may be added in later versions of git.
+
+Ident strings
+~~~~~~~~~~~~~
+Ident strings record who's responsible of doing something at what
+time. For a commit, the ident string in "author" line records who is
+the author of the associated changes and when the changes are
+made. The ident string in "committer" line records who commits the
+changes to the repository and at what time.
+
+An ident string consists of an email address and a timestamp. More
+precisely:
+
+1. Optionally, a name
+2. An email address wrapped around by `<` and `>`, followed by one
+   space (ASCII SP)
+3. The number of seconds since Epoch (00:00:00 UTC, January 1, 1970)
+   followed by a space (ASCII SP)
+4. Timezone: either plus or minus sign, followed by 4 decimal digits
+
+Name and email are encoded in UTF-8 and must must not contain ASCII
+NUL characters.
+
+Commit encoding
+~~~~~~~~~~~~~~~
+Encoding field describes that encoding that the commit message is
+encoded in. Encoding names must be recognized by iconv(3). By default,
+commit message is in UTF-8. It's discouraged to use encodings that can
+generate ASCII NUL characters.
+
+TAG OBJECTS
+-----------
+Tag object payload contains an object, object type, tag name, the name
+of the person ("tagger") who created the tag, and a message, which may
+contain a signature.
+
+------------------------------------------------
+$ git cat-file tag v1.5.0
+object 437b1b20df4b356c9342dac8d38849f24ef44f27
+type commit
+tag v1.5.0
+tagger Junio C Hamano <junkio@xxxxxxx> 1171411200 +0000
+
+GIT 1.5.0
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1.4.6 (GNU/Linux)
+
+iD8DBQBF0lGqwMbZpPMRm5oRAuRiAJ9ohBLd7s2kqjkKlq1qqC57SbnmzQCdG4ui
+nLE/L9aUXdWeTFPron96DLA=
+=2E+0
+-----END PGP SIGNATURE-----
+------------------------------------------------
+
+Tag object format resembles commit format. A tag commit may have the
+following fields in listed order:
+
+1. one "object" field with 40-character textual object name of the
+   tagged object
+2. one "type" field with type of the tagged object ("commit", "tag",
+   "blob", or "tree")
+3. one "tag" field with the name of the tag
+4. one "tagger" with an ident string
+
+New kinds of fields may be added in later versions of git.
+
+Any remaining text after the header forms the tag message. Tag message
+has no specified encoding. Anything that does not contain ASCII NUL
+characters are accepted.
+
+The object field must point to a valid object of type indicated by the
+type field. The tag name can be an arbitrary string without NUL bytes
+or embedded newlines; in practice it usually follows the restrictions
+described in linkgit:git-check-ref-format[1].
+
+GIT
+---
+Part of the linkgit:git[1] suite
-- 
1.7.8.36.g69ee2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]