Re: [RFC] Plumbing-only support for storing object metadata

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Sat, 9 Aug 2008 20:51:01 -0700

Scott Chacon <schacon@xxxxxxxxx> wrote:
> > We began trying to implement this proposal, but we found this enum
> > definition in cache.h, which made us think there's only room for one
> > more kind of object:
> >
> >        enum object_type {
> >                OBJ_BAD = -1,
> >                OBJ_NONE = 0,
> >                OBJ_COMMIT = 1,
> >                OBJ_TREE = 2,
> >                OBJ_BLOB = 3,
> >                OBJ_TAG = 4,
> >                /* 5 for future expansion */
> >                OBJ_OFS_DELTA = 6,
> >                OBJ_REF_DELTA = 7,
> >                OBJ_ANY,
> >                OBJ_MAX,
> >        };
> >
> > Do these object_type values appear in any on-disk structure, or does any
> > other reason exist why this set of values cannot change? Can we add
> > additional object types for inodes and props? If not, what would you
> > recommend instead?
> 
> If I'm not mistaken, these are the values used to identify data in the
> header sections of packfile objects.  The first four bits are used to
> identify the object type, where the first bit is static and the next
> three are the object type of the data following the header.  Since the
> type is encoded using those three bits, 0-7 is the valid range.  I
> would assume that would be difficult to change, since all the
> packfiles depend on that range.

Correct.  There is only room in the pack file for 3 bits in the
type field, resulting in types 0-7 as being the only valid range.

Only type 0 and 5 are available for use.

Nico and I have (at least in the past) agreed that type 0 is meant
as an escape indicator.  If the type is set to 0 then the real type
code appears in another byte of data which follows the object's
inflated length.

That leaves only type 5 available.  Note that because type 5 can be
encoded into a really small space (3 bits) compared to any other
type we may add we really want to use it for something which will
appear _very_frequently_.  The OBJ_DICT_TREE encoding we were talking
about doing for pack v4 fits that bill, as nearly any project (even
huge ones like Mozilla or KDE) would probably be using OBJ_DICT_TREE
thoughout their pack files, and there is a noticable reduction in
disk usage (and increased performance due to lower page faults)
as a result.

The proposed "inode" and "props" types sound like they are useful
for only less common cases, and would appear very infrequently
compared to a tree object.

So yea, there really aren't any new type bits available.

But tossing aside the type bit argument, I'm not sure I see the
value in adding limited arbitrary properties to names in a tree.
How does one edit these?  How do you inspect them before you get
a checkout, assuming they might actually have an impact on the
checkout process?  How the hell do you merge them?

I'm also very concerned about the limited range of values for both
keys and values in a "props" type.  Even if we did go down this
road of supporting such a concept at the plumbing layer (and in the
storage modal) everwhere else we are 8-bit clean.  Commit messages,
tag messages, blob contents, even file names in tree objects.
(OK, file names cannot contain a NUL byte, but whatever, that is
their only limitation.)

The proper encoding for both keys and values should permit any data
to be stored.  Doesn't the extended attributes feature in Linux and
FreeBSD both support any data to be attached to an inode in the fs?

Please don't get me wrong.

I think this is a _BAD_ idea.

A bad idea that will only clutter up the core object model, and
the core processing code of that object model.  Extended attributes
aren't used that much on local filesystems, because they are hard
to work with and suck performance wise.  Performance in Git is
a _feature_.  It matters.  Our clean object model really helps to
make that possible.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html