Re: [PATCH/RFC v2 0/16] Introduce index file format version 5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 8, 2012 at 8:38 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> If the workload we _care_ about is served better by using an API
> that works over an in-core tree-shaped index data structure, I do
> not think it is unreasonable to read the v2 on-disk format and
> represent it as a tree-shaped index while we read it.  Of course,
> there are things that are not as effective when reading from the
> flat v2 on-disk format (e.g. path limited reading will have to at
> least _scan_ the whole thing, even though it may process only the
> entries that match the pathspec) compared to reading from a
> tree-shaped on-disk format, but I doubt that the difference between
> the cost of reading into a flat array and the cost of reading and
> forming whatever non-flat data structure you seem to think is better
> is so big that it would negate the benefit of using a better in-core
> structure.

OK how about this. The general idea is preserve/extend current flat
index API and add a new (tree-based) one. Index users can use either.
They can even mix them up (which they do because we can't just flip
the API in one day for about 200 source files).

The day that unpack_trees() is converted to tree API, I will declare
v5 victory ;)

= Cleanup =

struct cache_entry becomes partly opaque. ce_ctime..ce_gid are hidden
in -v2.c and -v5.c. We only expose ce_size, ce_flags, ce_namelen, name
and sha1 to index users. Extra v5 fields like ce_stat_crc, next and
dir_next are also hidden. These fields can be put in a real struct in
read-cache.h, which is supposedly included by -v2.c and -v5.c

= Updating =

All index update API (add_index_entry, add_to_index,
remove_index_entry_at, remove_marked_cached_entries) are hooked by v5
when the loaded index is v5. v5 can update internal data when these
are called (e.g. conflict resolution), or just mark them "dirty" to be
worked on later in flush_index().

Anybody who updates a cache_entry is supposed to call
cache_entry_updated() function, which is no-op for v2 but v5 may want
to watch this activity.

Refreshing index is a special operation. Of course it's hooked by v5.
v5 may need its own implementation because it could walk working tree
and index tree at the same time. Of course v5 impl must also update
flat API data structure along the way.

A new function flush_index() is introduced, where v5 can update all
internal data and keep it in sync with index_state. When flat/tree
APIs are mixed, flush_index() must be called when switching from flat
API to tree API.

To help v5 deal with index rewrite in unpack_trees(),
index_bulk_update() may be introduced, which tells v5 "we are going to
do a lot of adding/removing/shuffling, keep your actions to minimum,
you most likely have to rebuild the trees at flush_index() anyway"

New API may be introduced for some big operations if it proves
v5-beneficial. I'm thinking of adding/removing a bunch of files by
pathspec, where v5 can walk working directory at the same time it
walks index directory tables.

= Tree traversal =

I don't see big problems here. We support opendir/readdir-like API for
tree traversing (with pathspec filtering). We also support
lookup_cache_entry to get cache_entry* of a certain path.

When tree traversal gets to a conflict entry, it lets the caller know
there's a conflict entry, it does not traverse through stage 1-3
during traversal. Caller is expected to use conflict lookup API for
that.

We also support reading partial index, filtered by pathspec. On v2, it
reads full index.

= Tree update =

At some point we may want to work on trees exclusively. Any operations
here must keep flat API data structure in sync.

We may want to postpone the sync if it's a lot of work, by doing all
the work in flush_index() before caller switches from tree API to flat
API again.

= Flat API deprecation =

At some point, tree update API will not update flat API any more
unless explicitly asked by caller. I don't expect "cache" in struct
index_state to be removed, unless we do really good merges using tree
API.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]