Re: using tree as attribute source is slow, was Re: Help troubleshoot performance regression cloning with depth: git 2.44 vs git 2.42

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 01, 2024 at 08:33:52PM -0400, Taylor Blau wrote:
> On Wed, May 01, 2024 at 06:00:30PM -0400, Jeff King wrote:
> > Bisecting show the culprit is 2386535511 (attr: read attributes from
> > HEAD when bare repo, 2023-10-13), which is in v2.43.0. Before that, a
> > bare repository would only look for attributes in the info/attributes
> > file. But after, we look at the HEAD tree-ish, too. And pack-objects
> > will check the "delta" attribute for every path of every object we are
> > packing. And remember that in-tree lookups for foo/bar/baz require
> > looking not just for .gitattributes, but also foo/.gitattributes,
> > foo/bar/.gitattributes, and foo/bar/baz/.gitattributes.
>
> Thanks for the explanation and bisection. I agree that 2386535511 makes
> sense as a likely culprit given what you wrote here.

Here is one possible approach, which is a partial revert of 2386535511.
I thought about suggesting that we revert 2386535511 entirely, but I
think that may be too strong of an approach especially if there are
plans to otherwise improve the performance of attr lookups with some
caching layer.

Instead, this patch changes the behavior to only fallback to "HEAD" in
bare repositories from check-attr, but leaves pack-objects, archive, and
all other builtins alone.

I should note, this is a pretty hacky approach to use the extern'd
git_attr_tree variable from within the check-attr builtin, but I think
that this does do the trick.

Alternatively, if this is too hacky or magical that check-attr does one
thing but every other command does something else, I would personally be
fine with a full revert of 2386535511.

Anyway, here is the patch:

--- 8< ---

Subject: [PATCH] attr.c: only read attributes from HEAD via check-attr

This patch is a partial revert of commit 2386535511d (attr: read
attributes from HEAD when bare repo, 2023-10-13), which caused Git to
start reading from .gitattributes files from HEAD^{tree} when invoked in
a bare repository.

This patch has an unfortunate side-effect of significantly slowing down
pack-objects, for example, when invoked in a bare repository without
using reachability bitmaps.

Prior to 2386535511d, pack-objects would only look at the
info/attributes file when working in a bare repository. But after,
pack-objects ends up looking at every "delta" attribute not just in the
info/attributes file, but for every .gitattributes file in each tree
recursively from the root down to the dirname of whatever path we're
inspecting. In other words, gathering attributes for path foo/bar/baz
requires reading .gitattributes, foo/.gitattributes,
foo/bar/.gitattributes, and foo/bar/baz/.gitattributes.

Restore the pre-2386535511d behavior for commands other than check-attr
(which was the intended target of the change described in 2386535511d).

If we want to cause pack-objects to use HEAD^{tree} as an attributes
source in bare repositories by default again, it would likely come after
some caching layer to avoid the performance penalty.

Reported-by: Dhruva Krishnamurthy <dhruvakm@xxxxxxxxx>
Helped-by: Jeff King <peff@xxxxxxxx>
Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
---
 attr.c                  | 7 -------
 builtin/check-attr.c    | 2 ++
 t/t5001-archive-attr.sh | 2 +-
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/attr.c b/attr.c
index 7c380c17317..33473bdce01 100644
--- a/attr.c
+++ b/attr.c
@@ -1220,17 +1220,10 @@ static void compute_default_attr_source(struct object_id *attr_source)
 	if (!default_attr_source_tree_object_name && git_attr_tree) {
 		default_attr_source_tree_object_name = git_attr_tree;
 		ignore_bad_attr_tree = 1;
 	}

-	if (!default_attr_source_tree_object_name &&
-	    startup_info->have_repository &&
-	    is_bare_repository()) {
-		default_attr_source_tree_object_name = "HEAD";
-		ignore_bad_attr_tree = 1;
-	}
-
 	if (!default_attr_source_tree_object_name || !is_null_oid(attr_source))
 		return;

 	if (repo_get_oid_treeish(the_repository,
 				 default_attr_source_tree_object_name,
diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index c1da1d184e9..9b445fe33c6 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -188,10 +188,12 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)

 	if (source) {
 		if (repo_get_oid_tree(the_repository, source, &initialized_oid))
 			die("%s: not a valid tree-ish source", source);
 		set_git_attr_source(source);
+	} else if (startup_info->have_repository && is_bare_repository()) {
+		git_attr_tree = "HEAD";
 	}

 	if (stdin_paths)
 		check_attr_stdin_paths(prefix, check, all_attrs);
 	else {
diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
index eaf959d8f63..0ff47a239db 100755
--- a/t/t5001-archive-attr.sh
+++ b/t/t5001-archive-attr.sh
@@ -136,11 +136,11 @@ test_expect_success 'git archive with worktree attributes, bare' '
 	(cd bare && git archive --worktree-attributes HEAD) >bare-worktree.tar &&
 	(mkdir bare-worktree && cd bare-worktree && "$TAR" xf -) <bare-worktree.tar
 '

 test_expect_missing	bare-worktree/ignored
-test_expect_missing	bare-worktree/ignored-by-tree
+test_expect_exists	bare-worktree/ignored-by-tree
 test_expect_exists	bare-worktree/ignored-by-worktree

 test_expect_success 'export-subst' '
 	git log "--pretty=format:A${SUBSTFORMAT}O" HEAD >substfile1.expected &&
 	test_cmp nosubstfile archive/nosubstfile &&
--
2.45.0.1.g3e84e921a0a.dirty

--- >8 ---

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux