On 09/10/2014 10:13 AM, Jeff King wrote: > [...] > I did coincidentally have an interesting experience with our lockfile > code earlier today, which I'd like to relate. > > I was running pack-refs on a repository with a very large number of > loose refs (about 1.8 million). [...] Each call to lock_ref_sha1_basic > allocates a "struct lock_file", which then gets added to the global > lock_file list. Each one contains a fixed PATH_MAX buffer (4K on this > machine). After we're done updating the ref, we leak the lock_file > struct, since there's no way to remove it from the list. > > As a result, git tried to allocate 7G of RAM and got OOM-killed (the > machine had only 8G). In addition to thrashing the disk even harder, > since there was no room left for disk cache while we touched millions of > loose refs. :) > > Your change in this series to use a strbuf would make this a lot better. > But I still wonder how hard it would be to just remove lock_file structs > from the global list when they are committed or rolled back. That would > presumably also make the "avoid transitory valid states" patch from your > series a bit easier, too (you could prepare the lockfile in peace, and > then link it in fully formed, and do the opposite when removing it). > > I think with your strbuf patch, this leak at least becomes reasonable. > So maybe it's not worth going further. But I'd be interested to hear > your thoughts since you've been touching the area recently. I've thought about this too, but it didn't seem to be worth the effort. (Though your use case certainly adds a bit of justification.) To make that change, we would have to remove entries from the list of lock_file objects in a way that the code can be interrupted at any time by a signal while leaving it in a state that is traversable by the signal handler. I think that can be done pretty easily with a singly-linked list. But with a singly-linked list, we would have to iterate through the list to find the node that needs to be removed. This could get expensive if there are a lot of nodes in the list (see below). So we would probably want to switch to using a doubly-linked list. I think this would also be fairly simple, given that the signal handler only needs to iterate through the list in a single direction. You'd just have to be careful about adjusting the pointers in the right order to let (say) a forwards traversal always work. Then the callers who use heap-allocated lock_file objects would have to be changed to free them when they're done with them, probably using a special function that releases the strbufs, too. Callers using statically-allocated lock_file objects would probably not have to be changed. But... The ref-transaction code is, I think, moving in the direction of updating all references in a single transaction. This means that we would need to hold locks for all of the references at once anyway. So it might be all for naught. Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html