On Mon, Jan 06, 2020 at 03:47:53PM -0800, Jonathan Nieder wrote: > >> + * Callers are responsible for calling write_object_file to record the > >> + * object in persistent storage before writing any other new objects > >> + * that reference it. > >> + */ > >> int pretend_object_file(void *, unsigned long, enum object_type, > >> struct object_id *oid); > > > > I think this is an improvement over the status quo, but it's still a > > potential trap for code which happens to run in the same process (see my > > other email in the thread). > > > > Should the message perhaps be even more scary? > > A pet peeve of mine is warning volume escalation: if it becomes common > for us to say > > * Warning: callers are reponsible for [...] > > then new warnings trying to stand out might say > > * WARNING: callers are responsible for [...] > > and then after we are desensitized to that, we may switch to > > * WARNING WARNING WARNING, not the usual blah-blah: callers are > > and so on. The main way I have found to counteract that is to make > the "dangerous curve" markers context-specific enough that people > don't learn to ignore them. After all, sometimes a concurrency > warning is important to me, at other times warnings about clarity may > be what attract my interest, and so on. I meant less about the number of capital letters, and more that we should be saying "this interface is dangerous; don't use it". Because it's not just "callers are responsible". It's "this can cause subtle and hard-to-debug issues because it's writing to global state". My preferred solution would actually be to rip it out entirely, but we'd need some solution for git-blame, the sole caller. Possibly it could insert the value straight into the diff_filespec. But according to the thread that I linked earlier, I poked at that last year but it didn't look trivial. > I don't have a good suggestion here. Perhaps "Callers are responsible > for" is too slow and something more terse would help? > > /* > * Adds an object to the in-memory object store, without writing it > * to disk. > * > * Use with caution! Unless you call write_object_file to record the > * in-memory object to persistent storage, any other new objects that > * reference it will point to a missing (in memory only) object, > * resulting in a corrupt repository. > */ Yeah, that's more what I had in mind. > It would be even better if we have some automated way to catch this > kind of issue. Should tests run "git fsck" after each test? Should > write_object_file have a paranoid mode that checks integrity? > > I don't know an efficient way to do that. Ultimately I am comfortable > counting on reviewers to be aware of this kind of pitfall. While > nonlocal invariants are always hard to maintain, this pitfall is > inherent in the semantics of the function, so I am not too worried > that reviewers will overlook it. Yeah, given the scope of the problem (we have a single caller, and this mechanism is over a decade old) I'm fine with review as the enforcement mechanism, too. > A less error-prone interface would tie the result of > pretend_object_file to a short-lived overlay on the_repository without > affecting global state. We could even enforce read-only access in > that overlay. I don't think the "struct repository" interface and > callers are ready for that yet, though. I agree that would be better, though it's still kind-of global (in that the repository object is effectively a global for most processes). -Peff