Re: What's cooking in git.git (Sep 2021, #08; Mon, 27)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 28, 2021 at 07:40:14PM -0400, Jeff King wrote:
> On Mon, Sep 27, 2021 at 11:46:40PM -0700, Elijah Newren wrote:
> 
> > > * en/remerge-diff (2021-08-31) 7 commits
> > >  - doc/diff-options: explain the new --remerge-diff option
> > >  - show, log: provide a --remerge-diff capability
> > >  - tmp-objdir: new API for creating and removing primary object dirs
> > >  - merge-ort: capture and print ll-merge warnings in our preferred fashion
> > >  - ll-merge: add API for capturing warnings in a strbuf instead of stderr
> > >  - merge-ort: add ability to record conflict messages in a file
> > >  - merge-ort: mark a few more conflict messages as omittable
> > >
> > >  A new presentation for two-parent merge "--remerge-diff" can be
> > >  used to show the difference between mechanical (and possibly
> > >  conflicted) merge results and the recorded resolution.
> > >
> > >  Will merge to 'next'?
> > 
> > It has been a month that it's been cooking with no issues brought up,
> > and it's been in production for nearly a year...
> > 
> > But just this morning I pinged peff and jrnieder if they might have
> > time to respectively look at the tmp-objdir stuff (patch 5, plus its
> > integration into log-tree.c in patch 7) and the ll-merge.[ch] changes
> > (patch 3).  I don't know if either will have time to do it, but
> > perhaps wait half a week or so to see if they'll mention they have
> > time?  Otherwise, yeah, it's probably time to merge this down.
> 
> Sorry to take so long. I think this is a very exciting topic, and I
> appreciate being called into to look at tmp-objdir stuff, because it can
> be quite subtle.
> 
> I just left a rather long-ish mail in the thread, but the gist of it is
> that I'm worried that there's some possibility of corruption there if
> the diff code writes objects. I didn't do a proof-of-concept there, but
> I worked one up just now. Try this:
> 
>   # make a repo
>   git init repo
>   cd repo
>   echo base >file
>   git add file
>   git commit -m base
> 
>   # two diverging branches
>   echo main >file
>   git commit -am main
>   git checkout -b side HEAD^
>   echo side >file
>   git commit -am side
> 
>   # we get a conflict, and we resolve
>   git merge main
>   echo resolved >file
>   git commit -am merged
> 
>   # now imagine we had a file with a diff driver. I stuffed it
>   # in here after the fact, but it could have been here all along,
>   # or come as part of the merge, etc.
>   echo whatever >unrelated
>   echo "unrelated diff=foo" >.gitattributes
>   git add .
>   git commit --amend --no-edit
> 
>   # set up the diff driver not just to do a textconv, but to cache the
>   # result. This will require writing out new objects for the cache
>   # as part of the diff operation.
>   git config diff.foo.textconv "$PWD/upcase"
>   git config diff.foo.cachetextconv true
>   cat >upcase <<\EOF &&
>   #!/bin/sh
>   tr a-z A-Z <$1
>   EOF
>   chmod +x upcase
> 
>   # now show the diff
>   git log -1 --remerge-diff
> 
>   # and make sure the repo is still OK
>   git fsck
> 
> The remerge diff will correctly show the textconv'd content (because
> it's not in either parent, and hence an evil merge):
> 
>   diff --git a/unrelated b/unrelated
>   new file mode 100644
>   index 0000000..982793c
>   --- /dev/null
>   +++ b/unrelated
>   @@ -0,0 +1 @@
>   +WHATEVER
> 
> but then fsck shows that our cache is corrupted:
> 
>   Checking object directories: 100% (256/256), done.
>   error: refs/notes/textconv/foo: invalid sha1 pointer 1e9b3ecffca73411001043fe914a7ec0e7291043
>   error: refs/notes/textconv/foo: invalid reflog entry 1e9b3ecffca73411001043fe914a7ec0e7291043
>   dangling tree 569b6e414203d2f50bdaafc1585eae11e28afc37
> 
> Now I'll admit the textconv-cache is a pretty seldom-used feature. And
> that there _probably_ aren't a lot of other ways that the diff code
> would try to write objects or references. But I think it speaks to the
> kind of subtle interactions we should worry about (and certainly
> corrupting the repository is a pretty bad outcome, though at least in
> this case it's "just" a cache and we could blow away that ref manually).
> 
> -Peff

It seems to me that one problem is that the new "primary" objdir code doesn't
disable ref updates since the GIT_QUARANTINE_ENVIRONMENT setting isn't set.
If we fix that we should be forbidding ref updates.

I've included a path that fixes my test case. This is on top of:
https://lore.kernel.org/git/CANQDOddqwVtWfC4eEP3fJB4sUiszGX8bLqoEVLcMf=v+jzx19g@xxxxxxxxxxxxxx/

>From 38be7f6d31da8df3f205b35d25dd0a505aa75a8a Mon Sep 17 00:00:00 2001
From: Neeraj Singh <neerajsi@xxxxxxxxxxxxx>
Date: Wed, 29 Sep 2021 11:24:05 -0700
Subject: [PATCH] tmp-objdir: disable ref updates when replacing the primary
 odb

When creating a subprocess with a temporary ODB, we set the
GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not
to update refs, since the tmp-objdir may go away.

Introduce a similar mechanism for in-process temporary ODBs when
we call tmp_objdir_replace_primary_odb.

Peff's test case was invoking ref updates via the cachetextconv
setting. That particular code silently does nothing when a ref
update is forbidden

Reported-by: Jeff King <peff@xxxxxxxx>

Signed-off-by: Neeraj Singh <neerajsi@xxxxxxxxxxxxx>
---
 object-file.c |  7 +++++++
 refs.c        | 22 +++++++++++++++++++++-
 refs.h        |  5 +++++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 990381abee..e3e5cec3c8 100644
--- a/object-file.c
+++ b/object-file.c
@@ -770,6 +770,12 @@ struct object_directory *set_temporary_primary_odb(const char *dir, int will_des
 	new_odb->will_destroy = will_destroy;
 	new_odb->next = the_repository->objects->odb;
 	the_repository->objects->odb = new_odb;
+
+	/*
+	 * Disable ref updates while a temporary odb is active, since
+	 * the objects in the database may roll back.
+	 */
+	refs_disable_updates();
 	return new_odb->next;
 }
 
@@ -786,6 +792,7 @@ void restore_primary_odb(struct object_directory *restore_odb, const char *old_p
 
 	the_repository->objects->odb = restore_odb;
 	free_object_directory(cur_odb);
+	refs_enable_updates();
 }
 
 /*
diff --git a/refs.c b/refs.c
index 8b9f7c3a80..98026b7341 100644
--- a/refs.c
+++ b/refs.c
@@ -19,6 +19,8 @@
 #include "repository.h"
 #include "sigchain.h"
 
+static int ref_update_disabled_count;
+
 /*
  * List of all available backends
  */
@@ -1045,6 +1047,24 @@ void ref_transaction_free(struct ref_transaction *transaction)
 	free(transaction);
 }
 
+void refs_disable_updates(void)
+{
+	++ref_update_disabled_count;
+}
+
+void refs_enable_updates(void)
+{
+	if (!ref_update_disabled_count)
+		BUG("ref updates are not disabled");
+
+	--ref_update_disabled_count;
+}
+
+int ref_updates_are_enabled(void)
+{
+	return !ref_update_disabled_count;
+}
+
 struct ref_update *ref_transaction_add_update(
 		struct ref_transaction *transaction,
 		const char *refname, unsigned int flags,
@@ -2126,7 +2146,7 @@ int ref_transaction_prepare(struct ref_transaction *transaction,
 		break;
 	}
 
-	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
+	if (getenv(GIT_QUARANTINE_ENVIRONMENT) || !ref_updates_are_enabled()) {
 		strbuf_addstr(err,
 			      _("ref updates forbidden inside quarantine environment"));
 		return -1;
diff --git a/refs.h b/refs.h
index 48970dfc7e..acd7e275c5 100644
--- a/refs.h
+++ b/refs.h
@@ -840,6 +840,11 @@ int ref_storage_backend_exists(const char *name);
 
 struct ref_store *get_main_ref_store(struct repository *r);
 
+/* Helpers to mark updates to the refs as forbidden */
+void refs_disable_updates(void);
+void refs_enable_updates(void);
+int ref_updates_are_enabled(void);
+
 /**
  * Submodules
  * ----------
-- 
2.25.1




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux