Still draft for discussion. Of three people who participated on this thread, two favor a man page (me and Jonathan), one techincal/ (Junio), so let's put it as a man page for now. Some notes: - I'm tempted to include pack-format.txt because I also document loose object format here. If it's included and gitrepository-layout.txt links to this, we have a quite complete documentation of what's inside $GIT_DIR (assuming rebase-apply and such are of private use) - Not sure if we fix the order of gpgsig and mergetag, or they can be mixed together. Also not sure if we can have multiple gpgsig, I haven't checked the code. - I skipped the experimental loose object format (it's what it's called in sha1_file.c). I think we can call it deprecated and move on. - Do we assume tag/commit header in utf-8 or ascii? - We don't do any encoding on ident strings, right? Mostly-written-by: Jonathan Nieder <jrnieder@xxxxxxxxx> Mostly-written-by: Junio C Hamano <gitster@xxxxxxxxx> Remaining-stolen-from: Documentation/user-manual.txt Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> --- Documentation/git-object.txt | 273 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 273 insertions(+), 0 deletions(-) create mode 100644 Documentation/git-object.txt diff --git a/Documentation/git-object.txt b/Documentation/git-object.txt new file mode 100644 index 0000000..359af37 --- /dev/null +++ b/Documentation/git-object.txt @@ -0,0 +1,273 @@ +git-object(5) +============= + +NAME +---- +git-object - Git object format + +SYNOPSIS +-------- +$GIT_DIR/objects/* + +OBJECT ON-DISK REPRESENTATIONS +------------------------------ +Objects can be stored on disk as loose (unpacked) objects or +in packs. Loose objects are in $GIT_DIR/objects/[0-9a-f][0-9a-f] +directories. Packs are $GIT_DIR/objects/pack/pack-*.pack. Each pack +has a corresponding index file to speed up pack access. + +Object SHA-1 +~~~~~~~~~~~~ +An object SHA-1 is calculated on its header and payload. The content +to be consumed by SHA-1 calculation is: + +- Object type in ascii, either "commit", "tree", "tag" or "blob" + (without quotes) +- One space (ASCII SP) +- The payload length in ascii canonical decimal format +- ASCII NUL +- Object payload + +Loose objects +~~~~~~~~~~~~~ +Loose objects are simply a compressed form using zlib(3) of the +object's header and payload, as described in Object SHA-1 section +above. + +Packed objects +~~~~~~~~~~~~~~ +FIXME maybe include Documentation/pack-format.txt + +BLOB OBJECTS +------------ +Blob object payload is file data. + +TREE OBJECTS +------------ +Tree object payload contains a list of entries, each with a mode, +object type, object name, and filename, sorted by filename. It +represents the contents of a single directory tree. + +The object type may be a blob, representing the contents of a file, +another tree, representing the contents of a subdirectory, or a commit +(representing a subproject). Since trees and blobs, like all other +objects, are named by a hash of their contents, two trees have the +same object name if and only if their contents (including, +recursively, the contents of all subdirectories) are identical. This +allows git to quickly determine the differences between two related +tree objects, since it can ignore any entries with identical object +names. + +Note that the files all have mode 644 or 755: git actually only pays +attention to the executable bit. + +Encoding +~~~~~~~~ +Entries are of variable length and self-delimiting. Each entry +consists of + +- a POSIX file mode in octal ascii representation, no 0 padding to the + left +- exactly one space (ASCII SP) +- filename for the entry, as a NUL-terminated string +- 20-byte binary object name + +The mode should be 100755 (executable file), 100644 (regular file), +120000 (symlink), 40000 (subdirectory), or 160000 (subproject), with +no leading zeroes. Modes with one leading zero and the synonym 100664 +for 100644 are also accepted for historical reasons. Other modes are +not accepted. + +The filename may be an arbitrary nonempty string of bytes, as long as +it contains no '/' or NUL character. + +The associated object must be a valid blob if the mode indicates a +file or symlink, tree if it indicates a subdirectory, or commit if it +indicates a subproject. The blob associated to a symlink entry +indicates the link target and its content not have any embedded NULs. + +Sorting +~~~~~~~ +Entries are sorted by memcmp(3) on file name. No duplicate file names +allowed. + +COMMIT OBJECT +------------- +The commit object links a physical state of a tree with a description +of how we got there and why. Commit object payload contains the +associated tree SHA-1, parent commits's SHA-1, author and comitter +information. + +------------------------------------------------ +$ git cat-file commit 81d48f0aee54 +tree 093f37084c133795e4ce71befa57185328737171 +parent f5e4e20faa1eee3feaa0394897bbd1aca544e809 +parent 661db794eb8179c7bea02f159bb691a2fff4a8e0 +parent 14c173eb63432ba5d0783b6c4b23a8fe0c76fb0f +author Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> 1326576355 -0800 +committer Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> 1326576355 -0800 +mergetag object 661db794eb8179c7bea02f159bb691a2fff4a8e0 + type commit + tag devicetree-for-linus + tagger Grant Likely <grant.likely@xxxxxxxxxxxx> 1326520038 -0700 + + 2nd set of device tree changes for v3.3 + -----BEGIN PGP SIGNATURE----- + Version: GnuPG v1.4.11 (GNU/Linux) + + iQIcBAABAgAGBQJPERbzAAoJEEFnBt12D9kBmDIP/R9Vspc6yhjSAEvdp/VET2gi + TgAQfdp4VuYjjIt4cUPO5UQU9kw478GjTuP2blZEC9DlG1jSf/L8U+A7FHJIVVzU + QfjwV1Lqaqk+sQQ1bsp2ixbesKECmqU9IweOIFmn0U2ZD+xlPFIpE2iTKEqymejf + PVZsFlkVmhQZgudPNieyZMjQpQ9hEb6UcSfXT//nmoRRxCL/PiMHGRx3UdS3eRe7 + FApSW0Mty/PD07QXPsDjg1GvK59Gf6R1/4Bd31+rXEz9yaxf4I4I02fL553NDVIt + tAPfo/4YKW1rLMWQRkAUqCaMk9v/DWxeWYbbiJNZ2R3kys9o8k26XXxvcuYnecS2 + G8DDJpmOikbN3Gvlskh40Tn3TJb5Wlgc7o/10L/fq6FovS4Uk7yUeFMqXUYfl8TU + ziIlrlt9IGabXBN4JKJl3OabgkeO+Oz9DKhTQFJLY4/121LAtFVk3xd316mY+wpX + mI83VmWMlp3sK+OLr+UdMTCXZvSIpu3KlGKMpAssHKUKxIV20NHLFNbm94/ywXBn + Zb8arjcv7+WzwhSqQJj851cq4/sEYx5HB4wU5Nm5SXBwcO3ixiij6lHCoHU+NudR + eyPIFLfrzwnUu3yTRgUfAnkgOce+2I+vUsU4pXUR6FyK73wSmm0+4WXQfB+OBlwD + 2O1RjZedZCb6zzf17H2k + =mup8 + -----END PGP SIGNATURE----- +mergetag object 14c173eb63432ba5d0783b6c4b23a8fe0c76fb0f + type commit + tag spi-for-linus + tagger Grant Likely <grant.likely@xxxxxxxxxxxx> 1326520366 -0700 + + SPI bug fixes for v3.3 + -----BEGIN PGP SIGNATURE----- + Version: GnuPG v1.4.11 (GNU/Linux) + + iQIcBAABAgAGBQJPERgyAAoJEEFnBt12D9kBRMsP/RBv6kWIb/qD7yJhrdbzJ4Tv + 1f7coSytuHupZVpxJstELKPugRmp2R6YeFbKw8P4P/12233Q0FcdKTF6ZE2h3cBp + bfCtyyzlFeY/nMfJKkwh37x2fHxNHynCCJEjHhecLday7NKQoTmmafivTfVmolWK + /MGjDarTAzC1FaP1xpBnuiI8eCr5WIgb4WmtvOmxIntVT077xggdJLL/Co7fBCqn + iibz3U/VyC68kQTGw6ELhnW1d7doHp7H3DJ2gPsh6lzpbv8JAnOMPpD+3Me1DVHE + Ay0kxPHV4bqnDyB+uEGppUiNoaTd5InrMAw+udDad60TMwOZzIvMkgxo0PIVM9Mm + k6mCcE2+TSnJetueX3cfrS5bRTPxUX7KRDC/WSp67/QPmelbYeRDLR7hrrQVqOPq + 5hIKMfz/kTBXcaXk643TEveaZlMuOZxHBYAvsbu5BX/3SQqYFS4POdxdeZVnUf54 + ITHhftBtrXacCsjKujp0xmKCIpF+8v3yKRxGEQssByv8v+CaymNrEls2vTF8tn5P + sAIjPFJYG+IHtDMIsTHOvSPA7uwWYsOVHFEYsbC1758esiBD8+qtfvFS3jAH99z+ + v2/aGsfMnjYEIsRtSm7PVTybJAo22Gr62yE/Q+rP//O0JaDahgdm009MjUo6BSgg + XNhZjQRYAYEExMTjJ2TK + =q39P + -----END PGP SIGNATURE----- + +Merge tags 'devicetree-for-linus' and 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6 + +2nd set of device tree changes and SPI bug fixes for v3.3 + +* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6: + of/irq: Add interrupts-names property to name an irq resource + of/address: Add reg-names property to name an iomem resource + +* tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6: + spi/tegra: depend instead of select TEGRA_SYSTEM_DMA +------------------------------------------------ + +More precisely, a commit object begins with of one or more lines +delimited by ASCII LF. The end of the header is signalled by an empty +line. Any remaining text after the empty line forms the commit +message. The header must not contain NUL. + +A "continuation line" in the header begins with an SP. The remainder +of the line, after removing that SP, is concatenated to the previous +line, while retaining the LF at the end of the previous line. + +When a line in the header begins with a letter other than SP, and has +at least one SP in it, it is called a "field". A field consists of the +"field name", which is the string before the first SP on the line, and +its "value", which is everything after that SP. When the value +consists of multiple lines, continuation lines are used. + +More than one field with the same name can appear in the header of an +object, and the order in which they appear is significant. A commit +object can contain these fields in the listed order: + +1. one "tree" field with the 40-character textual object name of the + associated tree object +2. zero or more "parent" fields, each with 40-character textual object + name of the parent commit object +3. one "author" field with an ident string +4. one "committer" field with an ident string +5. zero or one "encoding" field with an ascii string +6. zero or more "mergetag" fields with associated tag object content +7. zero or one "gpgsig" field with gpg signature content + +New kinds of fields may be added in later versions of git. + +Ident strings +~~~~~~~~~~~~~ +Ident strings record who's responsible of doing something at what +time. For a commit, the ident string in "author" line records who is +the author of the associated changes and when the changes are +made. The ident string in "committer" line records who commits the +changes to the repository and at what time. + +An ident string consists of an email address and a timestamp. More +precisely: + +1. Optionally, a name +2. An email address wrapped around by `<` and `>`, followed by one + space (ASCII SP) +3. The number of seconds since Epoch (00:00:00 UTC, January 1, 1970) + followed by a space (ASCII SP) +4. Timezone: either plus or minus sign, followed by 4 decimal digits + +Name and email are encoded in UTF-8 and must must not contain ASCII +NUL characters. + +Commit encoding +~~~~~~~~~~~~~~~ +Encoding field describes that encoding that the commit message is +encoded in. Encoding names must be recognized by iconv(3). By default, +commit message is in UTF-8. It's discouraged to use encodings that can +generate ASCII NUL characters. + +TAG OBJECTS +----------- +Tag object payload contains an object, object type, tag name, the name +of the person ("tagger") who created the tag, and a message, which may +contain a signature. + +------------------------------------------------ +$ git cat-file tag v1.5.0 +object 437b1b20df4b356c9342dac8d38849f24ef44f27 +type commit +tag v1.5.0 +tagger Junio C Hamano <junkio@xxxxxxx> 1171411200 +0000 + +GIT 1.5.0 +-----BEGIN PGP SIGNATURE----- +Version: GnuPG v1.4.6 (GNU/Linux) + +iD8DBQBF0lGqwMbZpPMRm5oRAuRiAJ9ohBLd7s2kqjkKlq1qqC57SbnmzQCdG4ui +nLE/L9aUXdWeTFPron96DLA= +=2E+0 +-----END PGP SIGNATURE----- +------------------------------------------------ + +Tag object format resembles commit format. A tag commit may have the +following fields in listed order: + +1. one "object" field with 40-character textual object name of the + tagged object +2. one "type" field with type of the tagged object ("commit", "tag", + "blob", or "tree") +3. one "tag" field with the name of the tag +4. one "tagger" with an ident string + +New kinds of fields may be added in later versions of git. + +Any remaining text after the header forms the tag message. Tag message +has no specified encoding. Anything that does not contain ASCII NUL +characters are accepted. + +The object field must point to a valid object of type indicated by the +type field. The tag name can be an arbitrary string without NUL bytes +or embedded newlines; in practice it usually follows the restrictions +described in linkgit:git-check-ref-format[1]. + +GIT +--- +Part of the linkgit:git[1] suite -- 1.7.8.36.g69ee2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html