Shawn O. Pearce wrote:
This triplet would probably be encoded with descendant-commit using OBJ_REF, ancestor-commit being an OBJ_OFS style back reference within the pack (or OBJ_REF if not in this pack) and ancestor-object would also be an OBJ_REF. So a triplet probably would wind up costing ~60 bytes.
I'd propose the following: Have a "shadow" tree which is storing DAG information per entry of the original tree. keep this shadow tree sorted in the same order the original tree object is. for simplicity i will now add the new information to the tree examples, but of course this new data cannot be stored in the tree itself and can't be hashed. tree <filename> <mode> <object> (<new: ancestor commit> <new: ancestor object>)* the ancestor object isn't necessary, it would just speed up annotations. considering that we want to look only at the path, it is superfluous. if we want to traverse copies/moves as well, this could be helpful. now, this information allows us to build a object-level (read: path identifies object) DAG, by starting out on the current tree and retrieving the associated information. using this it is possible to jump to the next commit/tree which changed the object and start over. expense: 8 or 40 bytes per parent, per object, per commit, for each tree modified. per parent: we of course store information about all parents per object: as this is a "shadow" tree, we need to annotate all entries and not just changed ones for each tree modified: (sub)trees not being modified of course do not need annotation *again* but: this shadow tree can be deltified quite tightly, i'd say. possibly with a different, specialized tree delta method. then this boils down to 8 or 40 bytes per *changed* object, plus delta overhead.
This of course penalizes objects which don't ever change, as we'd have to walk back a good chunk of the DAG before we find a matching triplet. But I would suspect that files which never change are also not given to log/blame very often either. And once we do find a triplet, we can skip through the DAG in time proportional to the rate of change for the path, rather than to the entire repository.
using my proposal this penalty does not exist. i think it would be really awkward to have the annotation of a never-changed Makefile to take way longer than the operation on a recently/often changed file. we'd have to compare the space requirements of both approaches. cheers simon -- Serve - BSD +++ RENT this banner advert +++ ASCII Ribbon /"\ Work - Mac +++ space for low €€€ NOW!1 +++ Campaign \ / Party Enjoy Relax | http://dragonflybsd.org Against HTML \ Dude 2c 2 the max ! http://golden-apple.biz Mail + News / \
Attachment:
signature.asc
Description: OpenPGP digital signature