Re: More precise tag following

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shawn O. Pearce wrote:
This triplet would probably be encoded with descendant-commit using
OBJ_REF, ancestor-commit being an OBJ_OFS style back reference within
the pack (or OBJ_REF if not in this pack) and ancestor-object would
also be an OBJ_REF.  So a triplet probably would wind up costing
~60 bytes.

I'd propose the following:

Have a "shadow" tree which is storing DAG information per entry of the original tree.  keep this shadow tree sorted in the same order the original tree object is.  for simplicity i will now add the new information to the tree examples, but of course this new data cannot be stored in the tree itself and can't be hashed.

tree
<filename> <mode> <object> (<new: ancestor commit> <new: ancestor object>)*

the ancestor object isn't necessary, it would just speed up annotations.  considering that we want to look only at the path, it is superfluous.  if we want to traverse copies/moves as well, this could be helpful.

now, this information allows us to build a object-level (read: path identifies object) DAG, by starting out on the current tree and retrieving the associated information.  using this it is possible to jump to the next commit/tree which changed the object and start over.

expense:  8 or 40 bytes per parent, per object, per commit, for each tree modified.

per parent:  we of course store information about all parents
per object:  as this is a "shadow" tree, we need to annotate all entries and not just changed ones
for each tree modified:  (sub)trees not being modified of course do not need annotation *again*

but:  this shadow tree can be deltified quite tightly, i'd say.  possibly with a different, specialized tree delta method.  then this boils down to 8 or 40 bytes per *changed* object, plus delta overhead.

This of course penalizes objects which don't ever change, as we'd
have to walk back a good chunk of the DAG before we find a matching
triplet.  But I would suspect that files which never change are
also not given to log/blame very often either.  And once we do find
a triplet, we can skip through the DAG in time proportional to the
rate of change for the path, rather than to the entire repository.

using my proposal this penalty does not exist.  i think it would be really awkward to have the annotation of a never-changed Makefile to take way longer than the operation on a recently/often changed file.

we'd have to compare the space requirements of both approaches.

cheers
 simon

--
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]