On Mon, Nov 13, 2023 at 05:34:42PM -0500, Taylor Blau wrote: > > It might not be too hard to just let in-process callers access the > > objects we've written. A quick and dirty patch is below, which works > > with the test you suggested (the test still fails because it finds a > > conflict, but it gets past the "woah, I can't find that sha1" part). > > That's a very slick idea, and I think that this series has some legs to > stand on now as a result. I'm glad people seem to like it. ;) I was mostly throwing it out there as an illustration, and I don't plan on polishing it up further. But in case somebody else wants to work on it, here are random extra thoughts on the topic: - rather than teach packfile.c about bulk checkin, I think it might be cleaner to let packed_git structs have a function pointer for accessing the index (and if NULL, naturally we'd open the .idx file in the usual way). And then bulk checkin could just register the "fake" packed_git - the flip side, though, is: would it be weird for other parts of the code to ever see the fake bulk packed_git in the list? It doesn't represent a real packfile the way the other ones do. So maybe it is better to keep it separate - I suspect this scheme may violate some bounds-checking of the packfiles. For example, we haven't written the hashfile trailer yet in the still-open packfile. But I think use_pack() assumes we always have an extra the_hash_algo->rawsz bytes of padding at the end. I'm not sure if that would ever cause us to accidentally read past the end of the file. - as you mentioned, an oidmap would be an obvious replacement for the existing unsorted list - the existing already_written() function faces the same problem. I don't think anybody cared because we'd usually only bulk checkin a few huge objects. But with --write-pack, we might write a lot of such objects, and the linear scan in already_written() makes it accidentally quadratic. - speaking of already_written(): it calls repo_has_object_file() to see if we can elide the write. It probably should be using freshen_*_object() in the usual way. Again, this is an existing bug but I suspect nobody noticed because the bulk checkin code path hardly ever kicks in. -Peff