Re: [PATCH v4 00/24] Index-v5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thomas Gummerer <t.gummerer@xxxxxxxxx> writes:

> Hi,
>
> previous rounds (without api) are at $gmane/202752, $gmane/202923,
> $gmane/203088 and $gmane/203517, the previous rounds with api were at
> $gmane/229732, $gmane/230210 and $gmane/232488.  Thanks to Duy for
> reviewing the the last round and Junio, Ramsay and Eric for additional
> comments.
>
> Since the last round I've added a POC for partial writing, resulting
> in the following performance improvements for update-index:
>
> Test                                        1063432           HEAD
> ------------------------------------------------------------------------------------
> 0003.2: v[23]: update-index                 0.60(0.38+0.20)   0.76(0.36+0.17) +26.7%
> 0003.3: v[23]: grep nonexistent -- subdir   0.28(0.17+0.11)   0.28(0.18+0.09) +0.0%
> 0003.4: v[23]: ls-files -- subdir           0.26(0.15+0.10)   0.24(0.14+0.09) -7.7%
> 0003.7: v[23] update-index                  0.59(0.36+0.22)   0.58(0.36+0.20) -1.7%
> 0003.9: v4: update-index                    0.46(0.28+0.17)   0.45(0.30+0.11) -2.2%
> 0003.10: v4: grep nonexistent -- subdir     0.26(0.14+0.11)   0.21(0.14+0.07) -19.2%
> 0003.11: v4: ls-files -- subdir             0.24(0.14+0.10)   0.20(0.12+0.08) -16.7%
> 0003.14: v4 update-index                    0.49(0.31+0.18)   0.65(0.34+0.17) +32.7%
> 0003.16: v5: update-index                   0.53(0.30+0.22)   0.50(0.28+0.20) -5.7%
> 0003.17: v5: ls-files                       0.27(0.15+0.12)   0.27(0.17+0.10) +0.0%
> 0003.18: v5: grep nonexistent -- subdir     0.02(0.01+0.01)   0.03(0.01+0.01) +50.0%
> 0003.19: v5: ls-files -- subdir             0.02(0.00+0.02)   0.02(0.01+0.01) +0.0%
> 0003.22: v5 update-index                    0.53(0.29+0.23)   0.02(0.01+0.01) -96.2%
>
> Given this, I don't think a complete change of the in-core format for
> the cache-entries is necessary to take full advantage of the new index
> file format.  Instead some changes to the current in-core format would
> work well with the new on-disk format.
>
> The current in-memory format fits the internal needs of git fairly well,
> so I don't think changing it to fit a better index file format would
> make a lot of sense, given that we can take advantage of the new format
> with the existing in-memory format.

Any more opinions on this series?  I've applied the changes suggested by
Duy, Antoine and Eric locally, but I wouldn't want to spam the list with
the whole series without a chance of this being applied.  How do you
want me to proceed?

> This series doesn't use kb/fast-hashmap yet, but that should be fairly
> simple to change if the series is deemed a good change.  The
> performance tests for update-index test require
> tg/perf-lib-test-perf-cleanup.
>
> Other changes, made following the review comments are:
>
> documentation: add documentation of the index-v5 file format
>   - Update documentation that directory flags are now 32-bits.  That
>     makes aligned access simpler
>   - offset_to_offset is no longer included in the checksum for files.
>     It's unnecessary.
>
> read-cache: read index-v5
>   - Add fix for reading with different level pathspecs given
>   - Use init_directory_entry to initialize all fields in a new
>     directory entry
>   - use memset to simplify the create_new_conflict function
>   - Add comments to explain -5 when reading directories and files
>   - Add comments for the more complex functions
>   - Add name flex_array to the end of ondisk_directory_entry for
>     simplified reading
>   - Add name flex_array to the end of ondisk_cache_entry for
>     simplified reading
>   - Move conflict reading functions to next patch
>   - mark functions as static when they are
>
> read-cache: read resolve-undo data
>   - Add comments for the more complex function
>   - Read conflicts + resolve undo data as extension
>
> read-cache: read cache-tree in index-v5
>   - Add comments for the more complex function
>   - Instead of sorting the directory entries, sort the cache-tree
>     directly.  This also required changing the algorithms with which
>     the cache entries are extracted from the directory tree.
>
> read-cache: write index-v5
>   - Free pointers allocated by super_directory
>   - Rewrite condition as suggested by Duy
>   - Don't check for CE_REMOVE'd entries in the writing code, they are
>     already checked in the compile_directory_data code
>   - Remove overly complicated directory size calculation since flags
>     are now 32-bits
>
> read-cache: write resolve-undo data for index-v5
>   - Free pointers allocated by super_directory
>   - Write conflicts + resolve undo data as extension
>
> introduce GIT_INDEX_VERSION environment variable
>   - Add documentation for GIT_INDEX_VERSION
>
> test-lib: allow setting the index format version
>
> Removed commits:
>   - read-cache: don't check uid, gid, ino
>   - read-cache: use fixed width integer types (independently in pu)
>   - read-cache: clear version in discard_index()
>
> Typos fixed as suggested by Eric Sunshine
>
> Thomas Gummerer (22):
>   read-cache: split index file version specific functionality
>   read-cache: move index v2 specific functions to their own file
>   read-cache: Re-read index if index file changed
>   add documentation for the index api
>   read-cache: add index reading api
>   make sure partially read index is not changed
>   grep.c: use index api
>   ls-files.c: use index api
>   documentation: add documentation of the index-v5 file format
>   read-cache: make in-memory format aware of stat_crc
>   read-cache: read index-v5
>   read-cache: read resolve-undo data
>   read-cache: read cache-tree in index-v5
>   read-cache: write index-v5
>   read-cache: write index-v5 cache-tree data
>   read-cache: write resolve-undo data for index-v5
>   update-index.c: rewrite index when index-version is given
>   introduce GIT_INDEX_VERSION environment variable
>   test-lib: allow setting the index format version
>   t1600: add index v5 specific tests
>   POC for partial writing
>   perf: add partial writing test
>
> Thomas Rast (1):
>   p0003-index.sh: add perf test for the index formats
>
>  Documentation/git.txt                            |    5 +
>  Documentation/technical/api-in-core-index.txt    |   56 +-
>  Documentation/technical/index-file-format-v5.txt |  294 +++++
>  Makefile                                         |   10 +
>  builtin/apply.c                                  |    2 +
>  builtin/grep.c                                   |   69 +-
>  builtin/ls-files.c                               |   36 +-
>  builtin/update-index.c                           |   50 +-
>  cache-tree.c                                     |   15 +-
>  cache-tree.h                                     |    2 +
>  cache.h                                          |  115 +-
>  lockfile.c                                       |    2 +-
>  read-cache-v2.c                                  |  561 +++++++++
>  read-cache-v5.c                                  | 1406 ++++++++++++++++++++++
>  read-cache.c                                     |  691 +++--------
>  read-cache.h                                     |   67 ++
>  resolve-undo.c                                   |    1 +
>  t/perf/p0003-index.sh                            |   74 ++
>  t/t1600-index-v5.sh                              |   25 +
>  t/t2101-update-index-reupdate.sh                 |   12 +-
>  t/test-lib-functions.sh                          |    5 +
>  t/test-lib.sh                                    |    3 +
>  test-index-version.c                             |    6 +
>  unpack-trees.c                                   |    3 +-
>  24 files changed, 2921 insertions(+), 589 deletions(-)
>  create mode 100644 Documentation/technical/index-file-format-v5.txt
>  create mode 100644 read-cache-v2.c
>  create mode 100644 read-cache-v5.c
>  create mode 100644 read-cache.h
>  create mode 100755 t/perf/p0003-index.sh
>  create mode 100755 t/t1600-index-v5.sh
>
> --
> 1.8.4.2
>

--
Thomas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]