On Mon, May 6, 2013 at 10:08 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Jeff King <peff@xxxxxxxx> writes: > >> On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote: >> >>> We don't care about blobs, or any object other than commits, but in >>> order to find the type of object, we are parsing the whole thing, which >>> is slow, specially in big repositories with lots of big files. >> >> I did a double-take on reading this subject line and first paragraph, >> thinking "surely fast-export needs to actually output blobs?". >> >> Reading the patch, I see that this is only about not bothering to load >> blob marks from --import-marks. It might be nice to mention that in the >> commit message, which is otherwise quite confusing. > > I had the same reaction first, but not writing the blob _objects_ > out to the output stream would not make any sense, so it was fairly > easy to guess what the author wanted to say ;-). That's how fast-export has worked since --export-marks was introduced. >> I'm also not sure why your claim "we don't care about blobs" is true, >> because naively we would want future runs of fast-export to avoid having >> to write out the whole blob content when mentioning the blob again. > > The existing documentation is fairly clear that marks for objects > other than commits are not exported, and the import-marks codepath > discards anything but commits, so there is no mechanism for the > existing fast-export users to leave blob marks in the marks file for > later runs of fast-export to take advantage of. The second > invocation cannot refer to such a blob in the first place. > > The story is different on the fast-import side, where we do say we > dump the full table and a later run can depend on these marks. Yes, and gaining nothing but increased disk-space. > By discarding marks on blobs, we may be robbing some optimization > possibilities, and by discarding marks on tags, we may be robbing > some features, from users of fast-export; we might want to add an > option "--use-object-marks={blob,commit,tag}" or something to both > fast-export and fast-import, so that the former can optionally write > marks for non-commits out, and the latter can omit non commit marks > if the user do not need them. But that is a separate issue. How? The only way we might rob optimizations is if there's an obscene amount files, otherwise the number of blob marks that we are *actually* going to use ever again is extremely tiny. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html