On 12 September 2016 at 23:02, Ori Rawlings <orirawlings@xxxxxxxxx> wrote: > Importing a long history from Perforce into git using the git-p4 tool > can be especially challenging. The `git p4 clone` operation is based > on an all-or-nothing transactionality guarantee. Under real-world > conditions like network unreliability or a busy Perforce server, > `git p4 clone` and `git p4 sync` operations can easily fail, forcing a > user to restart the import process from the beginning. The longer the > history being imported, the more likely a fault occurs during the > process. Long enough imports thus become statistically unlikely to ever > succeed. That would never happen :-) The usual thing that I find is that my Perforce login session expires. > > The underlying git fast-import protocol supports an explicit checkpoint > command. The idea here is to optionally allow the user to force an > explicit checkpoint every <x> seconds. If the sync/clone operation fails > branches are left updated at the appropriate commit available during the > latest checkpoint. This allows a user to resume importing Perforce > history while only having to repeat at most approximately <x> seconds > worth of import activity. I think this ought to work, and could be quite useful. It would be good to have some kind of test case for it though, and updated documentation. Luke > > Signed-off-by: Ori Rawlings <orirawlings@xxxxxxxxx> > --- > git-p4.py | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/git-p4.py b/git-p4.py > index fd5ca52..40cb64f 100755 > --- a/git-p4.py > +++ b/git-p4.py > @@ -2244,6 +2244,7 @@ class P4Sync(Command, P4UserMap): > optparse.make_option("-/", dest="cloneExclude", > action="append", type="string", > help="exclude depot path"), > + optparse.make_option("--checkpoint-period", dest="checkpointPeriod", type="int", help="Period in seconds between explict git fast-import checkpoints (by default, no explicit checkpoints are performed)"), > ] > self.description = """Imports from Perforce into a git repository.\n > example: > @@ -2276,6 +2277,7 @@ class P4Sync(Command, P4UserMap): > self.tempBranches = [] > self.tempBranchLocation = "refs/git-p4-tmp" > self.largeFileSystem = None > + self.checkpointPeriod = -1 Or use None? > > if gitConfig('git-p4.largeFileSystem'): > largeFileSystemConstructor = globals()[gitConfig('git-p4.largeFileSystem')] > @@ -3031,6 +3033,8 @@ class P4Sync(Command, P4UserMap): > > def importChanges(self, changes): > cnt = 1 > + if self.checkpointPeriod > -1: > + self.lastCheckpointTime = time.time() Could you just always set the lastCheckpointTime? > for change in changes: > description = p4_describe(change) > self.updateOptionDict(description) > @@ -3107,6 +3111,10 @@ class P4Sync(Command, P4UserMap): > self.initialParent) > # only needed once, to connect to the previous commit > self.initialParent = "" > + > + if self.checkpointPeriod > -1 and time.time() - self.lastCheckpointTime > self.checkpointPeriod: > + self.checkpoint() > + self.lastCheckpointTime = time.time() If you use time.time(), then this could fail to work as expected if the system time goes backwards (e.g. NTP updates). However, Python 2 doesn't have access to clock_gettime() without jumping through hoops, so perhaps we leave this as a bug until git-p4 gets ported to Python 3. > except IOError: > print self.gitError.read() > sys.exit(1) > -- > 2.7.4 (Apple Git-66) >