Re: metastore

david@xxxxxxx · Tue, 2 Oct 2007 17:52:25 -0700 (PDT)

On Wed, 3 Oct 2007, Julian Phillips wrote:

Subject: Re: metastore

On Tue, 2 Oct 2007, David Härdeman wrote:

On Tue, Oct 02, 2007 at 10:04:56PM +0200, David Kastrup wrote:
David Härdeman <david@xxxxxxxxxxx> writes:

>  On Tue, Oct 02, 2007 at 08:53:01PM +0100, martin f krafft wrote:
> > also sprach David Härdeman <david@xxxxxxxxxxx> [2007.09.19.2016 
+0100]:
> > >  But I agree, if any changes were made to git, I'd advocate adding
> > >  arbitrary attributes to files (much like xattrs) in name=value
> > >  pairs, then any extended metadata could be stored in those
> > >  attributes and external scripts/tools could use them in some way
> > >  that makes sense...and also make sure to only update them when it
> > >  makes sense.
> > > > So where would those metdata be stored in your opinion?
> >  I'm not sufficiently versed in the internals of git to have an
>  informed opinion :)

I think we have something like a length count for file names in index
and/or tree.  We could just put the (sorted) attributes after a NUL
byte in the file name and include them in the count.  It would also
make those artificially longer file names work more or less when
sorting them for deltification.

Or perhaps the index format could be extended to include a new field for 
value=name pairs instead of overloading the name field.

But as I said, I have no idea how feasible it would be to change git to 
support another arbitrary length field in the index/tree file.

However, this requires implementing _policies_: it must be possible to
specify per repository exactly what will and what won't get tracked,
or one will get conflicts that are not necessary or appropriate.

I think the opposite approach would be better. Let git provide 
set/get/delete attribute operations and leave it at that. Then external 
programs can do what they want with that data and add/remove/modify tags as 
necessary (and also include the smarts to not, e.g. remove the permissions 
on all files if the git repo is checked out to a FAT fs).

You need more than that.  You need to be able to log, blame etc on the 
attributes.  One of the big annoyances of Subversion properties is being 
unable to find out when or why a property value was changed.

I still don't see why the attributes need to be stored in git directly - 
particularly if you are going to use an external program to actually apply 
any settings - why not store the attributes as normal file (or files) of some 
sort tracked by git?  You could use any number of methods - e.g. use an 
sqlite database stored in the root of your tree, or a .<name>.props file 
alongside each path that you have properties for.  You could even write a 
system that uses such a method and was then SCM agnostic, allowing you to 
keep your attribute tracking system if/when something better than git comes 
along - or simply share it with less-fortunate souls stuck in an inferior 
system.

one other big advantage of keeping things in a normal file, it's easier to 
get the results accepted into git!

don't forget that the core git maintainers don't really see this as a 
worthwhile effort, so the more intrusive the result is the less likely it 
is to be accepted. It may end up that storing the attributes inside of git 
_is_ the best thing to do, but it's gong to be a whole lot easier to get a 
patch to implement this accepted if it's a migration from an existing, 
heavily used, implementation then if it's from the 'outside' with people 
saying "this is a neat thing, we think people would use it if it only had 
this"

and even if an internal implementation does end up being the right thing, 
the exact shape of the API is an item that will require a lot of debate 
(and probably a few false starts) to get right. let's figure out the 
real-world useage patterns first, and then work from there as appropriate.

shifting back onto implementaion details

in the discussion a few weeks ago I was told that there is a way to look 
at the contents of a file that hasn't been checked out yet (somehow it 
exists in a useable form 'in the index') but when I asked for information 
about how to do this I never got a response.

the reason for needing this is that the routines writing the files need to 
be able to access this information when they are dong so, but that file 
may not be checked out.

for that matter, .gitattributes should have a similar problem (if 
.gitattibutes for a directory hasn't been checked out yet how do you know 
if you could do the line ending conversions on a file or not?). how is the 
problem addressed there? (or is it the case that all the use so far has 
really not used the per-directory files and everything is in the master 
file, and that doesn't change enough to find these problems?

David Lang