Hi! Thanks for your explainations. On Thursday 12 April 2012 23:30:29 Andrew Sayers wrote: > On 12/04/12 16:28, Florian Achleitner wrote: > > I'm not sure if storing this in a seperate directory tree makes sense, > > mostly looking at performance. All these files will only contain some > > bytes, I guess. Andrew, why did you choose JSON? > > JSON has become my default storage format in recent years, so it seemed > like the natural thing to use for a format I wanted to chuck in and get > on with my work :) > > JSON is my default format because it's reasonably space-efficient, > human-readable, widely supported and can represent everything I care > about except recursive data structures (which I didn't need for this > job). You can do cleverer things if you don't mind being > language-specific (e.g. Perl's "Storable" module supports recursive data > structures but can't be used with other languages) or if you don't mind > needing special tools (e.g. git's index is highly efficient but can't be > debugged with `less`). I've found you won't go far wrong if you start > with JSON and pick something else when the requirements become more obvious. > > I gzipped the file because JSON isn't *that* space-efficient, and > because very large repositories are likely to produce enough JSON that > people will notice. I found that gzipping the file significantly > reduced its size without having too much effect on run time. > > I've attached a sample file representing the first few commits from the > GNU R repository. The problem I referred to obliquely before isn't with > JSON, but with gzip - how would you add more revisions to the end of the > file without gunzipping it, adding one line, then gzipping it again? > One very nice feature of a directory structure is that you could store > it in git and get all that stuff for free. > > To be clear, I'm not pushing any particular solution to this problem, > just offering some anecdotal evidence. I'm pretty sure that SVN branch > export is an I/O bound problem - David Barr has said much the same about > svn-fe, but I was surprised to see it was still the bottleneck with a > problem that stripped out almost all the data from the dump and pushed > it through not-particularly-optimised Perl. Having said that, the > initial import problem (potentially hundreds of thousands of revisions > needing manual attention) doesn't necessarily want the same solution as > update (tens of revisions that can almost always be read automatically). JSON seems to be a good initial choice.. > > >> . tracing history past branch creation events, using the now-saved > >> > >> copyfrom information. > >> > >> . tracing second-parent history using svn:mergeinfo properties. > > > > This is about detection when to create a git merge-commit, right? > > Yes - SVN has always stored metadata about where a directory was copied > from (unlike git, which prefers to detect it automatically), and since > version 1.0.5, SVN has added "svn:mergeinfo" metadata to files and > directories specifying which revisions of which other files or > directories have been cherry-picked in to them. > > If you know a directory is a branch, "copyfrom" metadata is a very > useful signal for detecting branches created from it. Unfortunately, > "svn:mergeinfo" is not as useful - aside from anything else, older > repositories often exhibit a period where there's no metadata at all, > then a gradual migration through SVN's early experiments with merge > tracking (like svnmerge.py), before everyone gradually standardises on > svn:mergeinfo and leaves the other tools behind. Oh, and the interface > doesn't tell you about unmerged revisions, so if anybody ever forgets to > merge a revision then you'll probably never notice. This doesn't look very straight forward. In the svn docs they say there is a command that outputs which changesets are eligible to merge. http://svnbook.red- bean.com/en/1.7/svn.branchmerge.basicmerging.html#svn.branchmerge.basicmerging.mergeinfo But I don't know if that helps. > > I'm planning to tackle this stuff in the work I'm doing, but I expect > people will be reporting edge cases until the day the last SVN > repository shuts down. You shouldn't need to worry about it much on the > git side of SBL, which is probably best for your sanity ;) :) > > - Andrew -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html