On Wed, Jan 08, 2025 at 02:14:51PM -0500, Taylor Blau wrote: > Introduce and use a new function which ensures that both parts of a > hashfile and hashfile_checkpoint pair use the same hash function > implementation to avoid such crashes. That makes sense. This should have been encapsulated all along, just like the actual hash initialization happens inside hashfile_init(). A hashfile_checkpoint is sort of inherently tied to a hashfile, right? I mean, it is recording an offset that only makes sense in the context of the parent hashfile. And that is only more true after the unsafe-hash patches, because now it needs to use the "algop" pointer from the parent hashfile (though for now we expect all hashfiles to use the same unsafe-algop, in theory we could use different checksums for each file). So in the new constructor: > +void hashfile_checkpoint_init(struct hashfile *f, > + struct hashfile_checkpoint *checkpoint) > +{ > + memset(checkpoint, 0, sizeof(*checkpoint)); > + f->algop->init_fn(&checkpoint->ctx); > +} ...should we actually record "f" itself? And then in the existing functions: > void hashfile_checkpoint(struct hashfile *f, struct hashfile_checkpoint *checkpoint) ...they'd no longer need to take the extra parameter. It creates a lifetime dependency of the checkpoint struct on the "f" it is checkpointing, but I think that is naturally modeling the domain. A semi-related thing I wondered about: do we need a destructor/release function of some kind? Long ago when this checkpoint code was added, a memcpy() of the sha_ctx struct was sufficient. But these days we use clone_fn(), which may call openssl_SHA1_Clone(), which does EVP_MD_CTX_copy_ex() under the hood. Do we have any promise that this doesn't allocate any resources that might need a call to _Final() to release (or I guess the more efficient way is directly EVP_MD_CTX_free() under the hood). My reading of the openssl manpages suggests that we should be doing that, or we may see leaks. But it may also be the case that it doesn't happen to trigger for their implementation. At any rate, we do not seem to have such a cleanup function. So it is certainly an orthogonal issue to your series. I wondered about it here because if we did have one, it would be necessary to clean up checkpoint before the hashfile due to the lifetime dependency I mentioned above. -Peff