[RFCv3 3/4] Third draft of CVS remote helper program

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Implements the import of objects from a local or remote CVS repository.

This helper program uses the "git_remote_cvs" Python package introduced
earlier, and provides a working draft implementation of the remote helper
API, as described in Documentation/git-remote-helpers.txt. Further details
about this specific helper are described in the new
Documentation/git-remote-cvs.txt.

This patch has been improved by the following contributions:
- Daniel Barkalow: Updates reflecting changes in remote helper API

Signed-off-by: Johan Herland <johan@xxxxxxxxxxx>
---
 Documentation/git-remote-cvs.txt |   85 +++++
 Makefile                         |   24 ++
 git-remote-cvs.py                |  697 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 806 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-remote-cvs.txt
 create mode 100755 git-remote-cvs.py

diff --git a/Documentation/git-remote-cvs.txt b/Documentation/git-remote-cvs.txt
new file mode 100644
index 0000000..783d542
--- /dev/null
+++ b/Documentation/git-remote-cvs.txt
@@ -0,0 +1,85 @@
+git-remote-cvs(1)
+==============
+
+NAME
+----
+git-remote-cvs - Helper program for interoperation with CVS repositories
+
+SYNOPSIS
+--------
+'git remote-cvs' <remote>
+
+DESCRIPTION
+-----------
+
+Please see the linkgit:git-remote-helpers[1] documentation for general
+information about remote helper programs.
+
+CONFIGURATION
+-------------
+
+remote.*.cvsRoot::
+	The URL of the CVS repository (as found in a `CVSROOT` variable, or
+	in a `CVS/Root` file).
+	Example: "`:pserver:user@server/var/cvs/cvsroot`".
+
+remote.*.cvsModule::
+	The path of the CVS module (as found in a `CVS/Repository` file)
+	within the CVS repository specified in `remote.*.cvsRoot`.
+	Example: "`foo/bar`"
+
+remote.*.cachedSymbolsOnly::
+	When 'true', a cache of CVS symbols is used instead of querying the
+	CVS server for all existing symbols (potentially expensive). In this
+	mode, git-remote-cvs will not discover new CVS symbols unless you add
+	them explicitly with the "`addsymbol <symbol>`" command (on
+	git-remote-cvs's stdin), or request an explicit symbol cache update
+	from the CVS server with the "`syncsymbols`" command (on
+	git-remote-cvs's stdin). When 'false' (the default), the CVS server
+	will be queried whenever a list of CVS symbols is required.
+
+remote.*.usernameMap::
+	The path (absolute, or relative to the repository (NOT the worktree))
+	to the file that contains the mapping from CVS usernames to the
+	corresponding full names and email addresses, as used by Git in the
+	Author and Committer fields of commit objects. When this config
+	variable is set, CVS usernames will be resolved against this file.
+	If no match is found in the file, or if this config variable is unset,
+	or if the variable points to a non-existing file, the original CVS
+	username will be used as the Author/Committer name, and the
+	corresponding email address will be set to "`<username>@example.com`".
++
+The format of the usernameMap file is one entry per line, where each line is
+of the form "`username: Full Name <email@address>`".
+Example: `johndoe: John Doe <johndoe@xxxxxxxxxxx>`
+Blank lines and lines starting with '#' are ignored.
+
+COMMANDS
+--------
+
+In addition to the commands that constitute the git-remote-helpers API, the
+following extra commands are supported for managing the local symbol cache when
+the `remote.*.cachedSymbolsOnly` config variable is true. The following
+commands can be given on the standard input of git-remote-cvs:
+
+'addsymbol'::
+	Takes one CVS symbol name as argument. The given CVS symbol is
+	fetched from the CVS server and stored into the local CVS symbol
+	cache. If `remote.*.cachedSymbolsOnly` is enabled, this can be used
+	to introduce a new CVS symbol to the CVS helper application.
+
+'syncsymbols'::
+	All CVS symbols that are available from the given remote are
+	fetched from the CVS server and stored into the local CVS symbol
+	cache. This is equivalent to disabling `remote.{asterisk}.cachedSymbolsOnly`,
+	running the "list" command, and then finally re-enabling the
+	`remote.*.cachedSymbolsOnly` config variable. I.e. this command can
+	be used to manually synchronize the CVS symbols available to the
+	CVS helper application.
+
+'verify'::
+	Takes one CVS symbol name as argument. Verifies that the CVS symbol
+	has been successfully imported be checking out the CVS symbol from
+	the CVS server, and comparing the CVS working tree against the Git
+	tree object identified by `refs/cvs/<remote>/<symbol>`. This can be
+	used to verify the correctness of a preceding 'import' command.
diff --git a/Makefile b/Makefile
index bb5cea2..b2af678 100644
--- a/Makefile
+++ b/Makefile
@@ -350,6 +350,8 @@ SCRIPT_PERL += git-relink.perl
 SCRIPT_PERL += git-send-email.perl
 SCRIPT_PERL += git-svn.perl
 
+SCRIPT_PYTHON += git-remote-cvs.py
+
 SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) \
 	  $(patsubst %.perl,%,$(SCRIPT_PERL)) \
 	  $(patsubst %.py,%,$(SCRIPT_PYTHON)) \
@@ -1474,6 +1476,28 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)) git-instaweb: % : unimplemented.sh
 	mv $@+ $@
 endif # NO_PERL
 
+ifndef NO_PYTHON
+$(patsubst %.py,%,$(SCRIPT_PYTHON)): % : %.py
+	$(QUIET_GEN)$(RM) $@ $@+ && \
+	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s --no-print-directory instlibdir` && \
+	sed -e '1{' \
+	    -e '	s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
+	    -e '}' \
+	    -e 's|^import sys.*|&; sys.path.insert(0, "@@INSTLIBDIR@@")|' \
+	    -e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
+	    $@.py >$@+ && \
+	chmod +x $@+ && \
+	mv $@+ $@
+else # NO_PYTHON
+$(patsubst %.py,%,$(SCRIPT_PYTHON)): % : unimplemented.sh
+	$(QUIET_GEN)$(RM) $@ $@+ && \
+	sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
+	    -e 's|@@REASON@@|NO_PYTHON=$(NO_PYTHON)|g' \
+	    unimplemented.sh >$@+ && \
+	chmod +x $@+ && \
+	mv $@+ $@
+endif # NO_PYTHON
+
 configure: configure.ac
 	$(QUIET_GEN)$(RM) $@ $<+ && \
 	sed -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
diff --git a/git-remote-cvs.py b/git-remote-cvs.py
new file mode 100755
index 0000000..1720d4c
--- /dev/null
+++ b/git-remote-cvs.py
@@ -0,0 +1,697 @@
+#!/usr/bin/env python
+
+"""Usage: git-remote-cvs <remote> [<url>]
+
+Git remote helper for interacting with CVS repositories
+
+See git-remote-helpers documentation for details on external interface, usage,
+etc. See git-remote-cvs documentation for specific configuration details of
+this remote helper.
+"""
+
+# PRINCIPLES:
+# -----------
+# - Importing same symbol twice (with no CVS changes in between) should yield
+#   the exact same Git state (and the second import should have no commits).
+# - Importing several CVS symbols pointing to the same state should yield
+#   corresponding refs pointing to the _same_ commit in Git.
+# - Importing a CVS symbol which has received only "regular commits" since
+#   last import should yield a fast-forward straight line of commits.
+
+# TODO / KNOWN PROBLEMS:
+# ----------------------
+# - Remove cachedSymbolsOnly config variable for now?
+# - Author map handling; mapping CVS usernames to Git full name + email address
+# - Handle files that have been created AND deleted since the last import
+# - How to handle CVS tags vs. CVS branches. Turn CVS tags into Git tags?
+# - Better CVS branch handling: When a branch as a super/subset of files/revs
+#   compared to another branch, find a way to base one branch on the other
+#   instead of creating parallel lines of development with roughly the same
+#   commits.
+# - Profiling, optimizations...
+
+import sys, string, os
+
+from git_remote_cvs.util import *
+from git_remote_cvs.cvs  import *
+from git_remote_cvs.git  import *
+from git_remote_cvs.cvs_symbol_cache import CvsSymbolCache
+from git_remote_cvs.commit_states    import CommitStates
+from git_remote_cvs.cvs_revision_map import CvsRevisionMap, CvsStateMap
+from git_remote_cvs.changeset        import build_changesets_from_revs
+
+class Config (object):
+	# Author name/email tuple for commits created by this tool
+	Author = ("git remote-cvs", "invalid@xxxxxxxxxxx")
+
+	# Git remote name
+	Remote = None
+
+	# CVS symbols are imported into this refs namespace/directory
+	RefSpace = None
+
+	# Git notes ref, the refname pointing to our git notes
+	NotesRef = None
+
+	# CVS repository identifier, a 2-tuple (cvs_root, cvs_module), where
+	# cvs_root is the CVS server/repository URL (as found in $CVSROOT, or
+	# in a CVS/Root file), and cvs_module is the path to a CVS module
+	# relative to the CVS repository (as found in a CVS/Repository file)
+	CvsRepo = (None, None)
+
+	# Path to the git-remote-cvs cache/work directory
+	# (normally "info/cvs/$remote" within $GIT_DIR)
+	WorkDir = None
+
+	# If False, the list of CVS symbols will always be retrieved from the
+	# CVS server using 'cvs rlog'. If True, only the cached symbols within
+	# the "symbols" subdirectory of WorkDir are consulted.
+	CachedSymbolsOnly = False
+
+	@classmethod
+	def init (cls, remote):
+		"""Fetch configurations parameters for the given remote"""
+		git_config = parse_git_config()
+		assert git_config["remote.%s.vcs" % (remote)] == "cvs"
+
+		cls.Author = (
+			git_config["user.name"], git_config["user.email"])
+		cls.Remote = remote
+		cls.RefSpace = "refs/cvs/%s/" % (remote)
+		cls.NotesRef = "refs/notes/cvs/%s" % (remote)
+		cls.CvsRepo = (
+			git_config["remote.%s.cvsroot" % (remote)],
+			git_config["remote.%s.cvsmodule" % (remote)])
+		cls.WorkDir = os.path.join(get_git_dir(), "info/cvs", remote)
+		cls.CachedSymbolsOnly = git_config_bool(git_config.get(
+			"remote.%s.cachedsymbolsonly" % (remote), "false"))
+
+def work_path (*args):
+	"""Return the given path appended to git-remote-cvs's cache/work dir"""
+	return os.path.join(Config.WorkDir, *args)
+
+def cvs_to_refname (cvsname):
+	"""Return the git ref name for the given CVS symbolic name"""
+	if cvsname.startswith(Config.RefSpace): # Already converted
+		return cvsname
+	return Config.RefSpace + cvsname
+
+def ref_to_cvsname (refname):
+	"""Return the CVS symbolic name for the given git ref name"""
+	if refname.startswith(Config.RefSpace):
+		return refname[len(Config.RefSpace):]
+	return refname
+
+def valid_cvs_symbol (symbol):
+	"""Return True iff the given CVS symbol can be imported into Git"""
+	return valid_git_ref(cvs_to_refname(symbol))
+
+def die_usage (msg, *args):
+	# Use this file's docstring as a usage string
+	print >>sys.stderr, __doc__
+	die(msg, *args)
+
+def import_cvs_revs (symbol, prev_state, cur_state, progress):
+	"""Import the CVS revisions involved in importing the given CVS symbol
+
+	This method will determine the CVS revisions involved in moving from
+	the given prev_state to the given cur_state. This includes looking at
+	revision metadata in CVS, and importing needed blobs from CVS.
+
+	The revision metadata is returned as a 2-level dict of CvsRev objects:
+	mapping path -> revnum -> CvsRev object.
+	"""
+
+	# Calculate the revisions involved in moving from prev_state to
+	# cur_state, and fetch CvsRev objects for these revisions.
+	progress.pushprefix("Importing CVS revisions: ")
+	paths = set(prev_state.paths()).union(cur_state.paths())
+	num_fetched_revs   = 0 # Number of CvsRev objects involved
+	num_imported_blobs = 0 # Number of blobs actually imported
+	cvs_revs = {} # path -> revnum -> CvsRev
+	for i, path in enumerate(sorted(paths)):
+		progress.pushprefix("(%i/%i) %s: " % (i + 1, len(paths), path))
+		progress("")
+		prev_rev = prev_state.get(path)
+		cur_rev = cur_state.get(path)
+		if prev_rev and cur_rev and prev_rev == cur_rev:
+			# No changes since last import
+			progress.popprefix()
+			continue
+
+		# Fetch CvsRev objects for range [path:prev_rev, path:symbol]
+		path_revs = fetch_revs(path, prev_rev, cur_rev, symbol,
+			Config.CvsRepo)
+		if not path_revs:
+			# Failed to find revs between prev_rev and symbol
+			if cur_rev:
+				assert not cur_rev.follows(prev_rev)
+				# The CVS symbol has been moved/reset since the
+				# last import in such a way that we cannot
+				# deduce the history between the last import
+				# and the current import.
+				# FIXME: Can we can work around this?
+				die("CVS symbol %s has been moved/reset from" \
+					" %s:%s to %s:%s since the last" \
+					" import. This is not supported",
+					symbol, path, prev_rev, path, cur_rev)
+			else:
+				# CVS symbol has been removed from this path.
+				# We cannot conclusively determine the history
+				# of this path following prev_rev.
+				# FIXME: Can we can work around this?
+				die("CVS symbol %s has been removed from %s" \
+					" since the last import. This is not" \
+					" supported", symbol, path)
+
+		# OK. We've got the revs in range [prev_rev, symbol]
+
+		# Verify/determine cur_rev
+		real_cur_rev = max(path_revs.keys())
+		if cur_rev: assert cur_rev == real_cur_rev
+		else: cur_rev = real_cur_rev
+
+		# No need to re-import prev_rev if already imported
+		if prev_rev:
+			assert cur_rev.follows(prev_rev)
+			assert prev_rev in path_revs
+			del path_revs[prev_rev]
+
+		assert path_revs # There should be more revs than just prev_rev
+
+		# Sanity checks:
+		# All revs from prev_rev to cur_rev are about to be imported
+		check_rev = cur_rev
+		while check_rev and check_rev != prev_rev:
+			assert check_rev in path_revs
+			check_rev = check_rev.parent()
+		# All previous revs have already been imported
+		check_rev = prev_rev
+		while check_rev:
+			assert Globals.CvsRevisionMap.has_rev(path, check_rev)
+			check_rev = check_rev.parent()
+
+		# Import CVS revisions as Git blobs
+		j = 0
+		for num, rev in sorted(path_revs.iteritems(), reverse = True):
+			j += 1
+			progress("(%i/%i) %s" % (j, len(path_revs), num))
+			assert num == rev.num
+
+			# Skip if already imported
+			if Globals.CvsRevisionMap.has_rev(rev.path, rev.num):
+				continue
+			# ...or if rev is a deletion
+			elif rev.deleted:
+				continue
+
+			# Import blob for reals
+			data = Globals.CvsWorkDir.get_revision_data(
+				rev.path, rev.num)
+			Globals.GitFastImport.comment(
+				"Importing CVS revision %s:%s" % (
+				rev.path, rev.num))
+			mark = Globals.GitFastImport.blob(data)
+			Globals.CvsRevisionMap.add_blob(
+				rev.path, rev.num, mark)
+			num_imported_blobs += 1
+
+		# Add path_revs to the overall structure of revs to be imported
+		assert path not in cvs_revs
+		cvs_revs[path] = path_revs
+		num_fetched_revs += len(path_revs)
+
+		progress.popprefix()
+
+	progress.popprefix()
+	progress("Imported %i blobs (reused %i existing blobs)" % (
+		num_imported_blobs, num_fetched_revs - num_imported_blobs),
+		True)
+
+	return cvs_revs
+
+def advance_state (state, changeset):
+	"""Advance the given state by applying the given changeset"""
+	# Verify that the given changeset "fits" on top of the given state
+	for rev in changeset:
+		prev_num = rev.num.parent()
+		state_num = state.get(rev.path)
+		if prev_num is None and state_num is None:
+			# 'rev' is the first revision of this path being added
+			state.add(rev.path, rev.num)
+		elif prev_num and state_num and prev_num == state_num:
+			if rev.deleted: # rev deletes path from state
+				state.remove(rev.path, prev_num)
+			else: # rev follows state's revision of this path
+				state.replace(rev.path, rev.num)
+		else:
+			error("Cannot apply changeset with %s:%s on top of " \
+			      "CVS state with %s:%s.",
+			      rev.path, changeset[rev.path].num,
+			      rev.path, state.get(rev.path))
+			error("    changeset: %s", changeset)
+			error("    CVS state: \n---\n%s---", state)
+			die("Failed to apply changeset. Aborting.")
+
+def revert_state (state, changeset):
+	"""Revert the given state to _before_ the given changeset is applied
+
+	This is the reverse of the above advance_state() function.
+	"""
+	for rev in changeset:
+		prev_num = rev.num.parent()
+		state_num = state.get(rev.path)
+		if state_num is None: # revert deletion of file
+			assert rev.deleted
+			state.add(rev.path, prev_num)
+		else:
+			assert state_num == rev.num
+			if prev_num is None: # revert addition of file
+				state.remove(rev.path, rev.num)
+			else: # regular revert to previous version
+				state.replace(rev.path, prev_num)
+
+def import_changesets (ref, changesets, from_state, to_state, progress):
+	"""Apply the given list of Changeset objects to the given ref
+
+	Also verify that the changesets bring us from the given from_state to
+	the given to_state.
+	"""
+	state = from_state
+	for i, c in enumerate(changesets):
+		advance_state(state, c)
+		progress("(%i/%i) Committing %s" % (i + 1, len(changesets), c))
+		# Make a git commit from changeset c
+		commitdata = GitFICommit(
+			c.author, # TODO: author_map handling
+			c.author + "@example.com", # TODO: author_map handling
+			c.date.ts,
+			c.date.tz_str(),
+			"".join(["%s\n" % (line) for line in c.message]),
+		)
+
+		for rev in c:
+			p, n = rev.path, rev.num
+			if rev.deleted:
+				commitdata.delete(p)
+				continue
+			blobname = Globals.CvsRevisionMap.get_blob(p, n)
+			mode = Globals.CvsRevisionMap.get_mode(p)
+			if mode is None: # Must retrieve mode from CVS checkout
+				debug("Retrieving mode info for '%s'" % (p))
+				Globals.CvsWorkDir.update(n, [p])
+				mode = Globals.CvsWorkDir.get_modeinfo([p])[p]
+				Globals.CvsRevisionMap.add_path(p, mode)
+			commitdata.modify(mode, blobname, p)
+
+		commitname = Globals.GitFastImport.commit(ref, commitdata)
+		Globals.CommitStates.add(
+			commitname, state, Globals.GitFastImport)
+		for path, revnum in state:
+			Globals.CvsRevisionMap.add_commit(
+				path, revnum, commitname)
+		assert commitname in Globals.CvsStateMap.get_commits(state)
+
+	assert state == to_state
+	return len(changesets)
+
+def import_cvs_symbol (cvs_symbol, progress):
+	"""Import the given CVS symbol from CVS to Git
+
+	Return False if nothing was imported, True otherwise.
+	"""
+	progress.pushprefix("%s: " % (cvs_symbol))
+
+	git_ref = cvs_to_refname(cvs_symbol)
+
+	# Verify that we are asked to import valid git ref names
+	if not valid_git_ref(git_ref):
+		progress("Invalid git ref '%s'. Skipping." % (git_ref), True)
+		progress.popprefix()
+		return False
+
+	# Retrieve previously imported CVS state
+	progress("Loading previously imported state of %s..." % (git_ref))
+	prev_commit = Globals.GitRefMap.get(git_ref)
+	prev_state = Globals.CommitStates.get(prev_commit, CvsState())
+
+	# Retrieve current CVS state of symbol
+	# Also: At some point we will need mode information for all CVS paths
+	# (stored in CvsRevisionMap). This information can be added for each
+	# path on demand (using CvsWorkDir.get_modeinfo()), but doing so may
+	# be an expensive process. It is much cheaper to load mode information
+	# for as many paths as possible in a _single_ operation. We do this
+	# below, by calling CvsRevisionMap.sync_modeinfo_from_cvs() in
+	# appropriate places
+	if Config.CachedSymbolsOnly:
+		progress("Synchronizing local CVS symbol cache for symbol...")
+		# The symbol cache is likely not up-to-date. Synchronize the
+		# given CVS symbol explicitly, to make sure we get the version
+		# current with the CVS server.
+		Globals.CvsSymbolCache.sync_symbol(
+			cvs_symbol, Globals.CvsWorkDir, progress)
+
+		# The above method updates the CVS workdir to the current CVS
+		# version. Hence, now is a convenient time to preload mode info
+		# from the currently checked-out CVS files. There may be more
+		# files for which we'll need mode information, but we'll deal
+		# with those when needed.
+		progress("Updating path mode info from current CVS checkout.")
+		Globals.CvsRevisionMap.sync_modeinfo_from_cvs(
+			Globals.CvsWorkDir)
+	elif not Globals.CvsRevisionMap: # There is no info for any paths, yet
+		# Pure optimization: We didn't get to preload all the mode info
+		# above. Normally, the only alternative is load mode info for
+		# each path on-demand. However, if our CvsRevisionMap is
+		# currently empty, that's probably going to be very expensive.
+		# Therefore, in this case, do an explicit CVS update here, and
+		# preload mode info for all paths.
+		progress("Updating CVS checkout to sync path mode info.")
+		Globals.CvsWorkDir.update(cvs_symbol)
+		Globals.CvsRevisionMap.sync_modeinfo_from_cvs(
+			Globals.CvsWorkDir)
+
+	progress("Loading current CVS state...")
+	try: cur_state = Globals.CvsSymbolCache[cvs_symbol]
+	except KeyError:
+		progress("Couldn't find symbol '%s'. Skipping." % (cvs_symbol),
+			True)
+		progress.popprefix()
+		return False
+
+	# Optimization: Check if the previous import of this symbol is still
+	# up-to-date. If so, there's nothing more to be done.
+	progress("Checking if we're already up-to-date...")
+	if cur_state == prev_state:
+		progress("Already up-to-date. Skipping.", True)
+		progress.popprefix()
+		return False
+
+	progress("Fetching CVS revisions...")
+	cvs_revs = import_cvs_revs(cvs_symbol, prev_state, cur_state, progress)
+
+	# Organize CvsRevs into a chronological list of changesets
+	progress("Organizing revisions into changesets...")
+	changesets = build_changesets_from_revs(cvs_revs)
+
+	# When importing a new branch, try to optimize branch start point,
+	# instead of importing entire branch from scratch
+	if prev_commit is None:
+		progress("Finding startpoint for new symbol...")
+		i = len(changesets)
+		state = cur_state.copy()
+		for c in reversed(changesets):
+			commit = Globals.CvsStateMap.get_exact_commit(
+				state, Globals.CommitStates)
+			if commit is not None:
+				# We have found a commit that exactly matches the state after commit #i (changesets[i - 1])
+				Globals.GitFastImport.reset(git_ref, commit)
+				changesets = changesets[i:]
+				break
+			revert_state(state, c)
+			i -= 1
+
+	num_changesets = len(changesets)
+	num_applied = 0
+
+
+	# Apply changesets, bringing git_ref from prev_state to cur_state
+	if num_changesets:
+		progress("Importing changesets...")
+		num_applied = import_changesets(git_ref, changesets, prev_state,
+			cur_state, progress)
+
+	progress("Imported %i changesets (reused %i existing changesets)" % (
+		num_applied, num_changesets - num_applied), True)
+	progress.popprefix()
+	return True
+
+def do_import (*args):
+	"""Do the 'import' command; import refs from a remote"""
+	if not args: die_usage("'import' takes at least one parameter: ref...")
+
+	progress = ProgressIndicator("    ", sys.stderr)
+
+	cvs_symbols = map(ref_to_cvsname, args)
+	empty_import = True
+
+	for symbol in cvs_symbols:
+		if import_cvs_symbol(symbol, progress):
+			empty_import = False
+
+	if empty_import:
+		progress.finish("Everything up-to-date", True)
+		return 0
+
+	progress.finish("Finished importing %i CVS symbols to Git" % (
+		len(cvs_symbols)), True)
+	return 0
+
+def do_list (*args):
+	"""Do the 'list' command; list refs available from a CVS remote"""
+	if args: die_usage("'list' takes no parameters")
+
+	progress = ProgressIndicator("    ", sys.stderr)
+
+	if Config.CachedSymbolsOnly:
+		progress("Listing symbols in local symbol cache...", True)
+		for symbol in sorted(Globals.CvsSymbolCache):
+			print cvs_to_refname(symbol)
+		progress.finish()
+		print # terminate output with blank line
+		return 0
+
+	# Synchronize local symbol cache with CVS server
+	progress("Synchronizing local symbol cache with CVS server...")
+	Globals.CvsSymbolCache.sync_all_symbols(Config.CvsRepo, progress,
+		valid_cvs_symbol)
+
+	# Load current states of Git refs
+	progress("Loading current state of Git refs...")
+	changed, unchanged = 0, 0
+	for cvs_symbol, cvs_state in sorted(Globals.CvsSymbolCache.items()):
+		git_ref = cvs_to_refname(cvs_symbol)
+		progress("\tChecking if Git ref is up-to-date: %s" % (git_ref))
+		git_commit = Globals.GitRefMap.get(git_ref)
+		git_state = Globals.CommitStates.get(git_commit)
+		attrs = ""
+		if git_state and git_state == cvs_state:
+			attrs = " unchanged"
+			unchanged += 1
+		else:
+			git_commit = "?"
+			changed += 1
+		print "%s %s%s" % (git_commit, git_ref, attrs)
+
+	progress.finish("Found %i CVS symbols (%i changed, %i unchanged)" % (
+	                changed + unchanged, changed, unchanged))
+	print # terminate with blank line
+	return 0
+
+def do_capabilities (*args):
+	"""Do the 'capabilities' command; report supported features"""
+	if args: die_usage("'capabilities' takes no parameters")
+	print "import"
+	print "marks %s" % (work_path("marks"))
+#	print "export"
+#	print "export-branch"
+#	print "export-merges"
+	print # terminate with blank line
+	return 0
+
+def do_addsymbol (*args):
+	"""Do the 'addsymbol' command; add given CVS symbol to local cache"""
+	if len(args) != 1: die_usage("'addsymbol' takes one parameter: symbol")
+	symbol = args[0]
+
+	progress = ProgressIndicator("    ", sys.stderr)
+	if valid_cvs_symbol(symbol):
+		Globals.CvsSymbolCache.sync_symbol(
+			symbol, Globals.CvsWorkDir, progress)
+		progress.finish("Added '%s' to CVS symbol cache" % (symbol),
+			True)
+	else:
+		error("Skipping CVS symbol '%s'; it is not a valid git ref",
+			symbol)
+
+	print # terminate with blank line
+	return 0
+
+def do_syncsymbols (*args):
+	"""Do the 'syncsymbols' command; sync all symbols with CVS server"""
+	if args: die_usage("'syncsymbols' takes no parameters")
+	progress = ProgressIndicator("    ", sys.stderr)
+	Globals.CvsSymbolCache.sync_all_symbols(Config.CvsRepo, progress,
+		valid_cvs_symbol)
+	progress.finish()
+	print # terminate with blank line
+	return 0
+
+def do_verify (*args):
+	"""Do the 'verify' command; Compare CVS checkout and Git tree"""
+	if len(args) != 1: die_usage("'verify' takes one parameter: symbol")
+	symbol = args[0]
+	gitref = cvs_to_refname(symbol)
+
+	progress = ProgressIndicator("    ", sys.stderr)
+	assert valid_cvs_symbol(symbol)
+
+	progress("Checking out '%s' from CVS..." % (symbol))
+	Globals.CvsWorkDir.update(symbol)
+
+	add_env = {"GIT_INDEX_FILE": os.path.abspath(work_path("temp_index"))}
+	progress("Creating Git index from tree object @ '%s'..." % (gitref))
+	cmd = ("git", "read-tree", gitref)
+	assert run_command(cmd, add_env = add_env)[0] == 0
+
+	progress("Comparing CVS checkout to Git index...", True)
+	cmd = ("git", "--work-tree=%s" % (os.path.abspath(work_path("cvs"))),
+		"ls-files",
+		"--exclude=CVS", "--deleted", "--modified", "--others", "-t")
+	exit_code, output, errors = run_command(cmd, add_env = add_env)
+	assert exit_code == 0 and not errors
+
+	if output:
+		progress.finish("Failed verification of '%s'" % (symbol), True)
+		error("The '%s' command returned:\n---\n%s---", " ".join(cmd),
+			output)
+	else:
+		progress.finish("Successfully verified '%s'" % (symbol), True)
+
+	print # terminate with blank line
+	return exit_code
+
+def not_implemented (*args):
+	die_usage("Command not implemented")
+
+Commands = {
+	"capabilities": do_capabilities,
+	"list":         do_list,
+	# Special handling of 'import' in main()
+	# "import":       do_import,
+	"export":       not_implemented,
+	# Custom commands
+	"addsymbol":    do_addsymbol,
+	"syncsymbols":  do_syncsymbols,
+	"verify":       do_verify,
+}
+
+class Globals (object):
+	"""Global variables are placed here at the start of main()"""
+	pass
+
+def main (*args):
+	debug("Invoked '%s'", " ".join(args))
+
+	### Initialization of subsystems
+
+	# Read config for the given remote
+	assert len(args) >= 2
+	Config.init(args[1])
+
+	# Local CVS symbol cache (CVS symbol -> CVS state mapping)
+	Globals.CvsSymbolCache = CvsSymbolCache(work_path("symbols"))
+
+	# Local CVS checkout
+	Globals.CvsWorkDir = CvsWorkDir(work_path("cvs"), Config.CvsRepo)
+
+	# Interface to 'git cat-file --batch'
+	Globals.GitObjectFetcher = GitObjectFetcher()
+
+	# Interface to Git object notes
+	Globals.GitNotes = GitNotes(Config.NotesRef, Globals.GitObjectFetcher)
+
+	# Mapping from Git commit objects to CVS states
+	Globals.CommitStates = CommitStates(Globals.GitNotes)
+
+	# Mapping from Git ref names to Git object names
+	Globals.GitRefMap = GitRefMap(Globals.GitObjectFetcher)
+
+	# Mapping from CVS revision to Git blob and commit objects
+	Globals.CvsRevisionMap = CvsRevisionMap(
+		cvs_to_refname("_metadata"), Globals.GitObjectFetcher)
+	last_mark = 0
+	if Globals.CvsRevisionMap.has_unresolved_marks():
+		# Update with marks from last import
+		last_mark = Globals.CvsRevisionMap.load_marks_file(
+			work_path("marks"))
+	else:
+		# Truncate marks file. We cannot automatically do this after
+		# .load_marks_file() above, since we cannot yet guarantee that
+		# we will be able to save the revision map persistently. (That
+		# can only happen if we are given one or more import commands
+		# below.) We can only truncate this file when we know there are
+		# no unresolved marks in the revision map.
+		open(work_path("marks"), "w").close()
+
+	# Mapping from CVS states to commit objects that contain said state
+	Globals.CvsStateMap = CvsStateMap(Globals.CvsRevisionMap)
+
+	### Main program loop
+
+	import_refs = [] # accumulate import commands here
+	# cannot use "for line in sys.stdin" for buffering (?) reasons
+	line = sys.stdin.readline()
+	while (line):
+		cmdline = line.strip().split()
+		if not cmdline: break # blank line means we're about to quit
+
+		debug("Got command '%s'", " ".join(cmdline))
+		cmd = cmdline.pop(0)
+
+		if cmd == "import":
+			import_refs.extend(cmdline)
+		else:
+			if cmd not in Commands:
+				die_usage("Unknown command '%s'", cmd)
+			if Commands[cmd](*cmdline):
+				die("Command '%s' failed", line.strip())
+			sys.stdout.flush()
+		line = sys.stdin.readline()
+
+	ret = 0
+	if import_refs: # trigger import processing after last import command
+		# Init producer of output in the git-fast-import format
+		Globals.GitFastImport = GitFastImport(
+			sys.stdout, Globals.GitObjectFetcher, last_mark)
+
+		# Perform import of given refs
+		ret = do_import(*import_refs)
+
+		### Notes on persistent storage of subsystems' data structures:
+		#
+		# Because the "import" command has been called, we here _know_
+		# that there is a fast-import process running in parallel.
+		# (This is NOT the case when there are no "import" commands).
+		# We can therefore now (and only now) safely commit the extra
+		# information that we store in the Git repo.
+		# In other words, the data structures that we commit to
+		# persistent storage with the following calls will NOT be
+		# committed if there are no "import" commands. The data
+		# structures must handle this in one of two ways:
+		# - In the no-"import" scenario, there is simply nothing to
+		#   commit, so it can safely be skipped.
+		# - Any information that should have been committed in the
+		#   no-"import" scenario can be reconstructed repeatedly in
+		#   subsequent executions of this program, until the next
+		#   invocation of an "import" command provides an opportunity
+		#   to commit the data structure to persistent storage.
+
+		# Write out commit notes (mapping git commits to CvsStates)
+		# The following call would be a no-op in the no-"import" case
+		Globals.GitNotes.commit_notes(
+			Globals.GitFastImport, Config.Author,
+			'Annotate commits imported by "git remote-cvs"\n')
+
+		# Save CVS revision metadata
+		# This data structure can handle the no-"import" case as long
+		# as the marks file from the last fast-import run is still
+		# present upon the next execution of this program.
+		Globals.CvsRevisionMap.commit_map(
+			Globals.GitFastImport, Config.Author,
+			'Updated metadata used by "git remote-cvs"\n')
+
+	return ret
+
+if __name__ == '__main__':
+	sys.exit(main(*sys.argv))
-- 
1.6.4.rc3.138.ga6b98.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]