On Thu, Sep 30, 2021 at 1:16 AM Jeff King <peff@xxxxxxxx> wrote: > > On Wed, Sep 29, 2021 at 11:43:39AM -0700, Neeraj Singh wrote: > > > It seems to me that one problem is that the new "primary" objdir code doesn't > > disable ref updates since the GIT_QUARANTINE_ENVIRONMENT setting isn't set. > > If we fix that we should be forbidding ref updates. > > > > I've included a path that fixes my test case. This is on top of: > > https://lore.kernel.org/git/CANQDOddqwVtWfC4eEP3fJB4sUiszGX8bLqoEVLcMf=v+jzx19g@xxxxxxxxxxxxxx/ > > Ah, right, I forgot we had that "forbid ref updates in quarantine" magic > (despite being the one who added it). > > I do think this improves the safety, but things are still a bit more > dangerous because we're handling it all in a single process, which sees > both the quarantine and non-quarantine states. I wrote more details in > this reply a few minutes ago: > > https://lore.kernel.org/git/YVVmssXlaFM6yD5W@xxxxxxxxxxxxxxxxxxxxxxx/ > > (sorry, it's long; search for the paragraph starting with "Whereas doing > it in-process"). > > > When creating a subprocess with a temporary ODB, we set the > > GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not > > to update refs, since the tmp-objdir may go away. > > > > Introduce a similar mechanism for in-process temporary ODBs when > > we call tmp_objdir_replace_primary_odb. > > > > Peff's test case was invoking ref updates via the cachetextconv > > setting. That particular code silently does nothing when a ref > > update is forbidden > > Oh good. I was worried that it would generate ugly errors, rather than > silently skipping the update. > > > @@ -2126,7 +2146,7 @@ int ref_transaction_prepare(struct ref_transaction *transaction, > > break; > > } > > > > - if (getenv(GIT_QUARANTINE_ENVIRONMENT)) { > > + if (getenv(GIT_QUARANTINE_ENVIRONMENT) || !ref_updates_are_enabled()) { > > strbuf_addstr(err, > > _("ref updates forbidden inside quarantine environment")); > > return -1; > > I think the overall goal of this patch makes sense, but this > conditional, along with tmp-objdir reaching out to the refs code, feels > funny. Should we perhaps have a single: > > int tmp_objdir_is_primary(struct repository *r) > { > if (the_tmp_objdir && > !strcmp(the_tmp_objdir->path.buf, r->objects->odb->path)) > return 1; /* our temporary is the primary */ > > if (getenv(GIT_QUARANTINE_PATH)) > return 1; /* our caller put us in quarantine */ > > return 0; > } > > Then it's all nicely abstracted and the ref code does not have to know > the details of GIT_QUARANTINE_PATH in the first place. > > If you do got that route, the strcmp() might need to be a little more > careful about whether r->objects can be NULL (I didn't check). > Alternatively, I kind of wonder if "struct object_directory" should just > have a flag that says "is_temporary". Actually, wouldn't this be the safest approach, for my particular usecase? By having the quarantine in place, no refs will be updated, which removes the risk of new refs mentioning objects that will go away. The only issue that could arise would be from new objects referencing objects that will go away. But the new objects are also written to the temporary object store when it remains the primary, and thus those new objects will also be deleted when the temporary object store is. In contrast, moving the temporary object store to the end of the alternates list would allow new objects to be written that reference the objects that will go away. Using pretend_object_file() will also allow new objects to be written that reference the objects that will go away. Said another way, I don't think anything should be writing a critical file that needs to be durable when we're in the middle of a "read-only" process like git-log. The only things written should be temporary stuff, like the automatic remerge calculation from merge-ort, the textconv cache optimization stuff, or perhaps future gitattributes transformation caching. All that stuff can safely be blown away at the completion of each merge.