[RFC] Plumbing-only support for storing object metadata

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The attached test illustrates a proposal for minimal plumbing support
usable to store permissions, ownership, and other metadata in git
repositories. This proposal is fully compatible with existing
repositories when the new functionality is not in use. Similar to the
introduction of subprojects, we have not yet specified the porcelain. We
believe that the plumbing will provide sufficient functionality for many
uses, and these uses will help determine the appropriate porcelain.

We would have included an implementation along with the test, but we
need help with a detail of git internals. More on that at the end. We'd
also appreciate feedback on the proposal.

We propose representing objects with metadata using a new "inode"
object. An inode object contains the hash of the real object and the
hash of a "props" (properties) object. A props object contains a set of
name-value pairs. Tree objects can reference inode objects in addition
to the current possibilities of blobs, trees, and subproject commits; we
propose using the currently invalid type 110000 (S_IFREG | S_IFIFO) for
inode objects. We primarily see a use case for inodes referencing blobs
and trees, though as defined they support any object type.

By separating property objects from inodes, objects with the same
properties can share the same property object; we expect, for instance,
that repositories reflecting /etc will have many references to the
"root:root 644" and "root:root 755" properties.

Both object types have a unique representation: equivalent inodes and
props objects will have the same hash. The exact format of an inode
looks like:
	<object_type> SP <object_sha1> LF
	props SP <props_sha1> LF
A property object looks like a sorted list of one or more of:
	<key> SP <value> LF
The same key is allowed to appear more than once, in which case the
lines will be sorted by the bytes of the values. Allowing duplicate keys
will make it easier to retrieve a set of similar properties such as
acls.

This format implies certain constraints on property names and values. We
propose limiting both names and values to printable ASCII (\x20-\x7E),
and disallowing spaces in keys. If some use case requires property names
or values with binary data, that property could use a printable encoding
such as base64.

We believe this proposal provides a sensible approach to storing
metadata in Git repositories; however, we're happy with any reasonable
solution that provides equivalent functionality. Some alternatives we
considered:

  - We could allow UTF-8 property names or values, rather than strictly
    ASCII. Our proposal is conservative in this regard, allowing an
    extension to UTF-8 later while remaining compatible with existing
    repositories.

  - We could allow arbitrary property names or values, by changing the
    props format to store lengths rather than using delimiters. This
    would not be a compatible change, so it needs to be decided early.

  - Tree objects already store mode bits, but we believe that it would
    prove simpler to store complete modes in properties rather than
    adjusting Git internals to preserve arbitrary mode bits in trees.
    Even if new versions of Git preserved the full mode, existing
    versions of Git might silently give incorrect results. Furthermore,
    mode bits other than executability seem of limited value without
    ownership information.

  - inode objects could directly store properties, rather than
    referencing a separate props object. This would eliminate one
    indirection needed to access properties. However, it would also
    reduce sharing of data for objects with the same properties.
    Furthermore, we expect that the indirection will have negligible
    cost when accessing objects from packs, given appropriately sorted
    packs. Shared props objects also suggest caching at various layers.

  - We could have called them "meta" objects instead of "props", but
    then we couldn't make "mad props" jokes.

We began trying to implement this proposal, but we found this enum
definition in cache.h, which made us think there's only room for one
more kind of object:

	enum object_type {
		OBJ_BAD = -1,
		OBJ_NONE = 0,
		OBJ_COMMIT = 1,
		OBJ_TREE = 2,
		OBJ_BLOB = 3,
		OBJ_TAG = 4,
		/* 5 for future expansion */
		OBJ_OFS_DELTA = 6,
		OBJ_REF_DELTA = 7,
		OBJ_ANY,
		OBJ_MAX,
	};

Do these object_type values appear in any on-disk structure, or does any
other reason exist why this set of values cannot change? Can we add
additional object types for inodes and props? If not, what would you
recommend instead?

- Jamey Sharp and Josh Triplett
#!/bin/sh
#
# Copyright (c) 2008 Josh Triplett and Jamey Sharp
#

test_description="Test inode plumbing"

. ./test-lib.sh

cat > shadow <<EOF
root:*:13943:0:99999:7:::
EOF
shadow_sha1=`git hash-object -t blob -w shadow`

cat > props <<EOF
group shadow
mode 640
owner root
EOF
props_sha1=FIXME

cat > inode <<EOF
blob $shadow_sha1
props $props_sha1
EOF
inode_sha1=FIXME

cat > tree <<EOF
110644 inode $inode_sha1	shadow
EOF
tree_sha1=FIXME

test_expect_success 'hash a props' '
	test $props_sha1 = "`git hash-object -t props -w props`"
'

test_expect_success 'cat-file a props' '
	git cat-file props $props_sha1 | cmp -s - props
'

test_expect_success 'hash an inode' '
	test $inode_sha1 = "`git hash-object -t inode -w inode`"
'

test_expect_success 'cat-file an inode' '
	git cat-file inode $inode_sha1 | cmp -s - inode
'

test_expect_success 'tree with inode' '
	test $tree_sha1 = "`git mktree < tree`"
'

test_expect_success 'ls-tree of tree with inode' '
	git ls-tree $tree_sha1 | cmp -s - tree
'

test_expect_success 'check type with cat-file' '
	test inode = "`git cat-file -t $tree_sha1:shadow`"
'

test_expect_success 'cat-file inode tree:inode' '
	git cat-file inode $tree_sha1:shadow | cmp -s - inode
'

test_expect_success 'cat-file blob tree:inode' '
	git cat-file blob $tree_sha1:shadow | cmp -s - shadow
'

test_expect_success 'cat-file props tree:inode' '
	git cat-file props $tree_sha1:shadow | cmp -s - props
'

test_expect_success 'read-tree' '
	git read-tree $tree_sha1
'

test_expect_success 'ls-files shows no modified files' '
	test -z "`git ls-files -m || echo fail`"
'

test_expect_success 'write-tree' '
	test $tree_sha1 = "`git write-tree`"
'

test_expect_success 'commit-tree' '
	COMMIT=`echo Commit with an inode | git commit-tree $tree_sha1` &&
	git update-ref HEAD $COMMIT
'

cat >shadow <<EOF
root:*:13943:0:99999:7:::
jamey:*:13943:0:99999:7:::
josh:*:13943:0:99999:7:::
EOF
shadow_sha1=FIXME

test_expect_success 'ls-files shows modified file' '
	test "shadow" = "`git ls-files -m`"
'

test_expect_success 'add modified file to index' '
	git add shadow
'

test_expect_success 'commit modification' '
	git commit -m "Modify shadow"
'

test_expect_success 'ls-files shows no modified files' '
	test -z "`git ls-files -m || echo fail`"
'

test_expect_success 'check type with cat-file, after modification' '
	test inode = "`git cat-file -t HEAD:shadow`"
'

cat > inode <<EOF
blob $shadow_sha1
props $props_sha1
EOF
inode_sha1=FIXME

test_expect_success 'cat-file inode HEAD:inode, after modification' '
	git cat-file inode HEAD:shadow | cmp -s - inode
'

test_expect_success 'cat-file blob HEAD:inode, after modification' '
	git cat-file blob HEAD:shadow | cmp -s - shadow
'

test_expect_success 'cat-file props HEAD:inode, after modification' '
	git cat-file props HEAD:shadow | cmp -s - props
'

test_done

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux