On Mon, May 6, 2013 at 7:31 AM, Jeff King <peff@xxxxxxxx> wrote: > On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote: > >> We don't care about blobs, or any object other than commits, but in >> order to find the type of object, we are parsing the whole thing, which >> is slow, specially in big repositories with lots of big files. > > I did a double-take on reading this subject line and first paragraph, > thinking "surely fast-export needs to actually output blobs?". If you think that, then you are not familiar with the code. --export-marks=<file>:: Dumps the internal marks table to <file> when complete. Marks are written one per line as `:markid SHA-1`. Only marks for revisions are dumped; marks for blobs are ignored. if (deco->base && deco->base->type == 1) { mark = ptr_to_mark(deco->decoration); if (fprintf(f, ":%"PRIu32" %s\n", mark, sha1_to_hex(deco->base->sha1)) < 0) { e = 1; break; } } > Reading the patch, I see that this is only about not bothering to load > blob marks from --import-marks. It might be nice to mention that in the > commit message, which is otherwise quite confusing. The commit message says it exactly like it is: we don't care about blobs. If an object is not a commit, we *already* skip it. But as the commit message already says, we do so by parsing the whole thing. > I'm also not sure why your claim "we don't care about blobs" is true, > because naively we would want future runs of fast-export to avoid having > to write out the whole blob content when mentioning the blob again. Because it's pointless to have hundreds and thousands of blob marks that are *never* going to be used, only for an extremely tiny minority that would. > Does that match your reasoning? It doesn't matter, it has been that way since --export-marks was introduced. >> Before this, loading the objects of a fresh emacs import, with 260598 >> blobs took 14 minutes, after this patch, it takes 3 seconds. > > Presumably most of that speed improvement comes from not parsing the > blob objects. I wonder if you could get similar speedups by applying the > "do not bother parsing" rule from your patch 3. You would still incur > some cost to create a "struct blob", but it may or may not be > measurable. That would mean we get the "case not worth worrying about" > from above for free. I doubt it would make that big a difference, > though, given the rarity of it. So I am OK with it either way. How would I know if it's a blob or a commit, if not by the code this patch introduces? -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html