On Tue, Feb 12, 2008 at 03:22:17PM -0500, Nicolas Pitre wrote: > On Tue, 12 Feb 2008, Martin Koegler wrote: > > We will need some additional memory for struct blob/tree/tag/commit > > even for this check. > > Why so? > > Each received object is stored in memory when received, so why can't you > simply validate it in place? No need to keep trace of them afterward. Freeing the data is not my problem. Many validations are in parse_XXX_buffer, which are also used by fsck. This returns a struct commit/tree/tag/blob. I have not found any code in git to free them. Same for pack-objects, e.g. add_objects_in_unpacked_packs allocates many struct object via lookup_unknown_object. As far as I understand the code, they are never freed, even if they are not needed later. > > > As to making sure those objects are well connected... well this is a > > > technically different issue entirely, and I wonder if a special mode to > > > fsck might not be a better solution. For example, fsck could be made to > > > validate object connectivity, starting from the new ref(s), and stopping > > > object walking as soon as a reference to an object not included in the > > > newly received pack is encountered. This could be run from some hook to > > > decide whether or not to update the new refs, and to delete the pack > > > otherwise. > > > > Do you really think, that this will need less memory? fsck loads first > > all objects and then verifies their connections. > > Not all objects otherwise I wouldn't even be able to run it. It only loads all objects, it checks (eg. no objects from the pack file by default). After loading a object, it frees the content of object, but keeps the parsed information in memory for the connectivity check. > My point is that you can have fsck load only objects contained in the > received pack (you can use the pack index to load them) and assume > connectivity is good whenever an object in the pack reference an > existing object outside of the pack. At least this doesn't need to > happen in parallel with pack indexing. So you propose the following: 1) run index-pack (write pack file + index in repository) 2) fsck pack file 3) update refs This looks to be very prone to race conditions, if multiple pushes run concurrently. To be on the safe side, the checks must be finished, before a pack/object becomes part of the repository. We must check the reachability before index-pack writes the index. In my opinion, separating the code for the reachability check from index-pack would complicate things. It would not safe memory, as the memory would be used by two processes instead of one. You don't want to increase the resource usage of index-pack. If --strict is not passed, I implemented this. For --strict, it is not avoidable, that index-pack will use more memory and cpu time. mfg Martin Kögler - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html