On 01/05/2013 04:11 PM, Eric S. Raymond wrote: > Perhaps I was unclear. I consider the interface design error to > be not in the fact that all the blobs are written first or detached, > but rather that the implementation detail of the two separate journal > files is ever exposed. > > I understand why the storage of intermediate results was done this > way, in order to decrease the tool's working set during the run, but > finishing by automatically concatenating the results and streaming > them to stdout would surely have been the right thing here. cvs2svn/cvs2git is built to be able to handle very large CVS repositories, not only those that can fit in RAM. This goal influences a lot of its design, including the pass-by-pass structure with intermediate databases and the resumability of passes. The blobfile necessarily contains every version of every file, with no delta-encoding and no compression. Its size can be a large multiple of the on-disk size of the original CVS repository. If the "save to tempfiles then cat tempfiles at end of run" behavior were hard-coded into cvs2git, then there would be no way to avoid requiring enough temporary space to hold the whole blobfile. Writing the blobfile into a separate file, on the other hand, means that for example the blobfile could be written into a named pipe connected to the standard input of "git fast-import" [1]. "git fast-import" could even be run on a remote server. I consider these bigger advantages than the ability to pipe the output of cvs2git directly into another command. > The downstream cost of letting the journalling implementation be > exposed, instead, can be seen in this snippet from the new git-cvsimport > I've been working on: > > def command(self): > "Emit the command implied by all previous options." > return "(cvs2git --username=git-cvsimport --quiet --quiet --blobfile={0} --dumpfile={1} {2} {3} && cat {0} {1} && rm {0} {1})".format(tempfile.mkstemp()[1], tempfile.mkstemp()[1], self.opts, self.modulepath) > > According to the documentation, every caller of csv2git must go > through analogous contortions! This is not the Unix way; if Unix > design principles had been minimally applied, that second line would > just read like this: > > return "cvs2git --username=git-cvsimport --quiet --quiet" Never in my worst nightmares did I imagine that my terrible design taste would force you to type an extra two lines of code. Oh the humanity! By the way, patches are welcome. And you don't need to trumpet their imminent arrival [2] or malign the existing code beforehand. Moreover, it would be adequate if you just demonstrate working code and *then* ask for "sign-in", rather than the other way around. > If Unix design principles had been thoroughly applied, the "--quiet > --quiet" part would be unnecessary too - well-behaved Unix commands > *default* to being completely quiet unless either (a) they have an > exceptional condition to report, or (b) their expected running time is > so long that tasteful silence would leave users in doubt that they're > working. cvs2git is not a command that one uses 100 times a day. It is a tool for one-shot conversions of CVS repositories to git. These conversions can take hours or even days of processing time (not to mention the time for configuring the conversion and changing the rest of a project's infrastructure from CVS to git). So yes, I think we would like to appeal to (b) and humbly ask for your permission to give the user some feedback during the conversion. > (And yes, I do think violating these principles is a lapse of taste when > git tools do it, too.) > > Michael Haggerty wants me to trust that cvs2git's analysis stage has > been fixed, but I must say that is a more difficult leap of faith when > two of the most visible things about it are still (a) a conspicuous > instance of interface misdesign, and (b) documentation that is careless and > incomplete. The cvs2git documentation is lacking; I admit it (as opposed to the cvs2svn documentation, which I think is quite complete). And the program itself also has a lot of rough edges, for example its inability to convert .cvsignore files into .gitignore files. Patches are welcome. I haven't used cvs2svn for my own purposes in many years and I've *never* once had a need to use cvs2git; I maintain these programs purely as a service to the community. Most of the community seems satisfied with the programs as they are, and if not they usually submit courteous and concrete bug reports or submit patches. I request that you follow their example. I especially ask that you restrain from spreading public FUD about imagined problems based on speculation. Please do your tests and *then* report any problems that you find. Yours, Michael [1] In fact, the current implementation of generate_blobs.py sometimes seeks back to earlier parts of the blob file when it needs the fulltext of a revision that has already been output, but this would be easy to change as soon as somebody needs it. [2] http://comments.gmane.org/gmane.comp.version-control.git/212340 -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html