Re: [PATCH v1 0/5] Allocate cache entries from memory pool

Jameson Miller <jameson.miller81@xxxxxxxxx> · Mon, 23 Apr 2018 12:19:05 -0400

On 04/18/2018 12:49 AM, Junio C Hamano wrote:
Jameson Miller <jamill@xxxxxxxxxxxxx> writes:

This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. ...

Jameson Miller (5):
   read-cache: teach refresh_cache_entry to take istate
   Add an API creating / discarding cache_entry structs
   mem-pool: fill out functionality
   Allocate cache entries from memory pools
   Add optional memory validations around cache_entry lifecyle

  apply.c                |  26 +++---
  blame.c                |   5 +-
  builtin/checkout.c     |   8 +-
  builtin/difftool.c     |   8 +-
  builtin/reset.c        |   6 +-
  builtin/update-index.c |  26 +++---
  cache.h                |  40 ++++++++-
  git.c                  |   3 +
  mem-pool.c             | 136 ++++++++++++++++++++++++++++-
  mem-pool.h             |  34 ++++++++
  merge-recursive.c      |   4 +-
  read-cache.c           | 229 +++++++++++++++++++++++++++++++++++++++----------
  resolve-undo.c         |   6 +-
  split-index.c          |  31 +++++--
  tree.c                 |   4 +-
  unpack-trees.c         |  27 +++---
  16 files changed, 476 insertions(+), 117 deletions(-)

base-commit: cafaccae98f749ebf33495aec42ea25060de8682
I couldn't quite figure out what these five patches were based on,
even with this line.  Basing on and referring to a commit that is
not part of our published history with "base-commit" is not all that
useful to others.
My apologies - this patch series should be applied to the 'next'
branch.  It applies cleanly on top of
b46fe60e1d ("merge-fix/ps/test-chmtime-get", 2018-04-20), which
is a commit in the 'next' branch.
Offhand without applying these patches and viewing the changes in
wider contexts, one thing that makes me wonder is how the two
allocation schemes can be used with two implementations of free().
Upon add_index_entry() that replaces an index entry for an existing
path, we'd discard an entry that was originally read as part of
read_cache().  If we do that again, the second add_index_entry()
will be discading, in its call to replace_index_entry(), the entry
that was allocated by the caller of the first add_index_entry()
call.  And replace_index_entry() would not have a way to know from
which allocation the entry's memory came from.

Perhaps it is not how you are using the "transient" stuff, and from
the comment in 2/5, it is for "entries that are not meant to go into
the index", but then higher stage index entries in unpack_trees seem
to be allocated via the "transient" stuff, so I am not sure what the
plans are for things like merge_recursive() that uses unpack_trees()
to prepare the index and then later "fix it up" by further futzing
around the index to adjust for renames it finds, etc.

Good points. The intention with this logic is that any entries
that *could* go into an index are allocated from the memory
pool. The "transient" entries only exist for a short period of
time. These have a defined lifetime and we can always trace the
corresponding "free" call. make_transient_cache_entry() is only
used to construct a temporary cache_entry to pass to the
checkout_entry() / write_entry(). There is a note in checkout.c
indicating that write_entry() needs to be re-factored to not take
a cache_entry.

The cache_entry type could gain knowledge about where it was
allocated from. This would allow us to only have a
single "free()" function, which could inspect the cache_entry to
see if it was allocated from a mem_pool (and possibly which
mem_pool) and take the appropriate action. The downside of this
approach is that cache_entry would need to gain a field to track
this information, and I *think* all of the bit field spots are
taken.

Let me read it fully once we know where these patches are to be
applied, but before that I cannot say much about them X-<.

Thanks.