Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 6, 2013 at 10:08 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Jeff King <peff@xxxxxxxx> writes:
>
>> On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote:
>>
>>> We don't care about blobs, or any object other than commits, but in
>>> order to find the type of object, we are parsing the whole thing, which
>>> is slow, specially in big repositories with lots of big files.
>>
>> I did a double-take on reading this subject line and first paragraph,
>> thinking "surely fast-export needs to actually output blobs?".
>>
>> Reading the patch, I see that this is only about not bothering to load
>> blob marks from --import-marks. It might be nice to mention that in the
>> commit message, which is otherwise quite confusing.
>
> I had the same reaction first, but not writing the blob _objects_
> out to the output stream would not make any sense, so it was fairly
> easy to guess what the author wanted to say ;-).

That's how fast-export has worked since --export-marks was introduced.

>> I'm also not sure why your claim "we don't care about blobs" is true,
>> because naively we would want future runs of fast-export to avoid having
>> to write out the whole blob content when mentioning the blob again.
>
> The existing documentation is fairly clear that marks for objects
> other than commits are not exported, and the import-marks codepath
> discards anything but commits, so there is no mechanism for the
> existing fast-export users to leave blob marks in the marks file for
> later runs of fast-export to take advantage of.  The second
> invocation cannot refer to such a blob in the first place.
>
> The story is different on the fast-import side, where we do say we
> dump the full table and a later run can depend on these marks.

Yes, and gaining nothing but increased disk-space.

> By discarding marks on blobs, we may be robbing some optimization
> possibilities, and by discarding marks on tags, we may be robbing
> some features, from users of fast-export; we might want to add an
> option "--use-object-marks={blob,commit,tag}" or something to both
> fast-export and fast-import, so that the former can optionally write
> marks for non-commits out, and the latter can omit non commit marks
> if the user do not need them. But that is a separate issue.

How? The only way we might rob optimizations is if there's an obscene
amount files, otherwise the number of blob marks that we are
*actually* going to use ever again is extremely tiny.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]