Jeff King <peff@xxxxxxxx> wrote: > On Tue, May 17, 2016 at 10:07:16AM +0200, Lars Schneider wrote: > > > I think that is pretty much the problem. Here is what is happening: > > > > 1. git-p4 imports all changelists for the "main" branch > > > > 2. git-p4 starts to import a second branch and creates a fast-import > > "progress checkpoint". This triggers: > > > > --> fast-import.c: cycle_packfile > > --> fast-import.c: end_packfile > > --> fast-import.c: loosen_small_pack > > > > At this point we have no packfile anymore. > > > > 3. git-p4 sends the command "commit refs/remotes/p4/depot/branch2" > > to fast-import in order to create a new branch. This triggers: > > > > --> fast-import.c: parse_new_commit > > --> fast-import.c: load_branch > > --> fast-import.c: load_tree > > > > "load_tree" calls "find_object" and the result has a "pack_id" of 0. > > I think this indicates that the object is in the packfile. Shouldn't > > the "pack_id" be "MAX_PACK_ID" instead? Yes; I think that is correct. Alternative patch to Jeff's coming in reply to this message. > > myoe = find_object(sha1); > > if (myoe && myoe->pack_id != MAX_PACK_ID) { > > Thanks for the analysis. I think this is definitely the problem. After > fast-import finalizes a packfile (either at the end of the program or > due to a checkpoint), it never discards its internal mapping of objects > to pack locations. It _has_ to keep such a mapping before the pack is > finalized, because git's regular object database doesn't know about it. > But afterwards, it should be able to rely on the object database. Almost; but relying on marks is a problem since that set can contain mark => object_entry mappings which the normal object database won't know about. > The patch below probably makes your case work, but there are a lot of > open questions: > > 1. Should we always discard the mapping, even when not loosening > objects? I can't think of a real downside to always using git's > object lookups. I'm not sure. It's safe to clear the top-level table, but it might speedup some lookups for just oe->type if we keep it around. I decided to keep it, anyways, because the mark set references them. > 2. Can we free memory associated with object_entry structs at this > point? They won't be accessible via the hash, but might other bits > of the code have kept pointers to them? Yes, invalid entries are also held in "struct mark_set marks"; this is a major problem with merely clearing the top-level object table. > I suspect it may screw up the statistics that fast-import prints at > the end, but that's a minor point. I still need to check, on that; but yeah, minor. > 3. I notice that a few other structs (branch and tag) hold onto the > pack_id, which will now point to a pack we can't access. Does this > matter? I don't think so, because checkpoint() seems to dump the > branches and tags. I don't think it matters unless a crash log or core dump is created; then it becomes confusing to the person tracking down a problem, so I've invalidated pack_id. This doesn't affect dump_branches or dump_tags from what I can tell. > 4. In general, should we be loosening objects at checkpoints at all? I think so. It should be useful to checkpoint to make objects available to other read-only processes while leaving a fast-import running indefinitely. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html