Re: [PATCH 4/4] fast-import: only store commit objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/06/2013 11:19 PM, Felipe Contreras wrote:
> On Mon, May 6, 2013 at 10:18 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
>>
>>> Yes, it can be handy to start loading the first "blobfile" in parallel
>>> with the later stages of the conversion, before the second "dumpfile" is
>>> ready.  In that case the user needs to pass --export-marks to the first
>>> fast-import process to export marks on blobs so that the marks can be
>>> passed to the second fast-import via --import-marks.
>>>
>>> So the proposed change would break a documented use of cvs2git.
>>>
>>> Making the export of blob marks optional would of course be OK, as long
>>> as the default is to export them.
>>
>> Thanks for a concise summary.  Your use case fits exactly what
>> Felipe conjectured as the nonexistent minority.
> 
> Not true. cvs2git does *not* rely on the blobs being stored in a marks
> file, because cvs2git does not rely on mark files at all.
> 
>> An option that lets the caller say "I only care about marks on these
>> types of objects to be written to (and read from) the exported marks
>> file" would help Felipe's use case without harming your use case,
>> and would be a sane and safe way to go.
> 
> His case is not harmed at all. It's only the unfortunate command that
> is mentioned in the documentation that didn't need to be mentioned at
> all in the first place.
> 
> It should be the other way around, if it's only this documentation
> that is affected, we could add a switch for that particular command,
> and the documentation should be updated, but it's overkill to add a
> switch for one odd command in some documentation somewhere, it would
> be much better to update the odd command to avoid using marks at all,
> which is what the more appropriate command does, right below in the
> same documentation.
> 
>   cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import
> 
> Should the rest of the real world be punished because somebody added a
> command in some documentation somewhere, which wasn't actually needed
> in the first place?

Don't get too fixated on the documentation.  The documentation just
gives some examples of how cvs2git can be used.

The reason that cvs2git outputs two files is that the first file is
emitted at the very beginning of the conversion and the second at the
very end.  These conversions can take a long time (> 1 day for very big
repos), can be interrupted and restarted between "passes", and passes
can even be re-run with changed configurations.

CVS write access has to be turned off before the start of the final
conversion, so no VCS is possible until the conversion is over.  So
users are very interested in keeping the downtime minimal.  The blobfile
can also be unwieldy (its size is approximately the sum of the sizes of
all revisions of all files in the project).  Being able to load the
blobfile into one fast-import process and the dumpfile into a different
process (which relies on the feature that you propose removing) opens up
a lot of possibilities:

* The first fast-import of the blobfile can be started as soon as the
blobfile is complete and run in parallel with the rest of the conversion.

* If the blobfile needs to be transferred over the network (e.g.,
because Git will be served from a different server than the one doing
the conversion) the network transfer can also be done in parallel with
the rest of the conversion.

* The blobfile could be written to a named pipe that is being read by a
git-fast-import process, to avoid having to write the blobfile to disk
in the first place.

* The user could run "git repack" between loading the blobfile and
loading the dumpfile.

These are just the ways that cvs2git does and/or could benefit from the
flexibility that is now in git-fast-import.  Other tools might also be
using git-fast-import in ways that would be broken by your proposed change.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]