Re: using tree as attribute source is slow, was Re: Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

>   - the cache here is static-local in the function. It should probably
>     at least be predicated on the tree_oid, and maybe attached to the
>     repository object? I think having one per repository at a time would
>     be fine (generally the tree_oid is set once per process, so it's not
>     like you're switching between multiple options).

It should be per tree_oid or you will get a stale and incorrect
result when you read paths from a different tree.  But thanks for
that "something simple and stupid" code to clearly demonstrate
that repeated reading of the attributes data is the problem.

Given a tree with a name, the result of reading a path from that
tree does not depend on the repository the tree appears in, so the
cache does not have a reason to be tied to a particular repository.
Generally we work only inside a single repository, so attaching the
cache to that single repository would be a good way to make it
available globally without adding another global variable, as
the_repository can serve as the starting point for the global state,
but other than that there is no reason.

I agree that the attribute layer may be a better place to cache this
data.  As you pointed out, it already has a caching behaviour in its
attr_stack data structure that is optimized for local walk that
visits every path in a tree in depth first order, but it is likely
that a different caching scheme that is more suitable for random
access may need to be introduced.  The cache eviction strategy may
need some thought (the attr_stack based caching has an obviously
optimal eviction strategy---to evict the attribute data read from a
directory when the traversal leaves that directory) in order to
avoid unbounded bloat of the cached data.

> I've cc'd John as the author of 2386535511. But really, that was just
> enabling by default the attr-tree code added by 47cfc9bd7d (attr: add
> flag `--source` to work with tree-ish, 2023-01-14). Although in that
> original context (git check-attr) the lack of caching would be much less
> important.
>
> -Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux