[PATCH] Add checksumming feature to copyfile module

Marcus Lauer <marcuslauer@xxxxxxxxxx> · Fri, 26 Nov 2010 12:52:15 -0500

I have added checksumming to the current "chunked" implementation of 
copyfile.  If the source file on the Overlord and target file on the Minion 
have the same checksum, the target is not overwritten.

        My understanding is that copyfile originally calculated a checksum for 
the source and target files and only overwrote the target if the checksums 
differed.  The "copyfile" method of the CopyFile object on the Minion side 
handled this.

        Currently copyfile always overwrites the target file.  It appears that 
when copyfile was modified to write files in chunks (avoiding out-of-memory 
situations when copying large files) this feature vanished.  Instead of 
calling a single "copyfile" method there are two methods: "open" to truncate 
(or create) the target file and "append" to keep adding data to it.  Both of 
these methods are always applied.

        I have changed the "open" method so that it compares the checksums of 
the source and target.  It just returns zero and does not truncate the target 
file if the checksums match.  If they do not match it still truncates the 
target and returns the file path, as before.

        I have changed the "append" method to look up the return code from 
"open" each time it is called.  If the return code was zero or any number 
instead of the file path, it does not append anything to the target file.  
Note that "append" is still called repeatedly even on a Minion which is not 
going to do any appending to that particular file.  This is inefficient but it 
works.

        There should be a better way of adding the checksumming code to both 
the Overlord and Minion copyfile.py files.  I'll look into that next.  Also, 
it should now be relatively easy to re-implement the --force argument (skip 
checksumming) and to create a new --backup argument (backup the target before 
overwriting like the original copyfile method did).

        Based on the experience of writing this code I would like to make a 
feature suggestion.  It would be useful if the "run" method in the "Overlord" 
class could take an optional "mask" argument which could instruct it to skip 
certain Minions.  The mask could be a dictionary with minion names as the keys 
and something like True/False as the data.  Coincidentally this is almost 
exactly the same structure which is returned by the "run" method in the 
"Overlord" class itself.  This feature would make it easy and efficient to 
write functions which have multiple "run" calls which are dependent upon one 
another.

--
Marcus

diff -rupN func/func//minion/modules/copyfile.py func.new/func//minion/modules/copyfile.py

--- func/func//minion/modules/copyfile.py	2010-11-25 20:16:50.000000000 -0500
+++ func.new/func//minion/modules/copyfile.py	2010-11-25 19:55:28.000000000 -0500
@@ -26,6 +26,7 @@ import time
 import shutil
 
 import func_module
+from func import utils as func_utils
 
 
 class CopyFile(func_module.FuncModule):
@@ -40,28 +41,35 @@ class CopyFile(func_module.FuncModule):
         return thissum.hexdigest()
 
     def checksum(self, thing):
-
         CHUNK=2**16
         thissum = hashlib.new('sha1')
         if os.path.exists(thing):
-            fo = open(thing, 'r', CHUNK)
-            chunk = fo.read
-            while chunk:
+            fo = open(thing, 'r')
+            while True:
                 chunk = fo.read(CHUNK)
+                if not chunk:
+                    break
                 thissum.update(chunk)
             fo.close()
             del fo
         else:
-            # assuming it's a string of some kind
+	    # assuming it's a string of some kind
             thissum.update(thing)
 
-        return thissum.hexdigest()
+        hexdig = thissum.hexdigest()
+        return hexdig
 
-    def open(self, filepath, mode=None, uid=-1, gid=-1):
+    def open(self, filepath, remote_sum, mode=None, uid=-1, gid=-1):
         dirpath = os.path.dirname(filepath)
         if not os.path.exists(dirpath):
             os.makedirs(dirpath)
 
+        if os.path.exists(filepath):
+            local_sum = self.checksum(filepath)
+
+	    if remote_sum == local_sum:
+                return 0
+
         # Create empty file
         try:
             fo = open(filepath, 'w')
@@ -82,7 +90,14 @@ class CopyFile(func_module.FuncModule):
 
         return filepath
 
-    def append(self, filepath, filebuf):
+    def append(self, open_result, filebuf):
+        hostname = func_utils.get_hostname_by_route()
+        filepath = open_result[hostname]
+
+        # If the result of the open function was 0 (or -1), do not append, return the same value.
+	if type(filepath) == type(0):
+	    return filepath
+
         if not os.path.exists(filepath):
             # file disaperead
             return -1
diff -rupN func/func//overlord/modules/copyfile.py func.new/func//overlord/modules/copyfile.py
--- func/func//overlord/modules/copyfile.py	2010-11-25 20:16:50.000000000 -0500
+++ func.new/func//overlord/modules/copyfile.py	2010-11-25 19:42:51.000000000 -0500
@@ -4,9 +4,40 @@ import stat
 import sys
 import xmlrpclib
 
+try:
+    import hashlib
+except ImportError:
+    # Python-2.4.z ... gah! (or even 2.3!)
+    import sha
+    class hashlib:
+        @staticmethod
+        def new(algo):
+            if algo == 'sha1':
+                return sha.new()
+            raise ValueError, "Bad checksum type"
+
 from func.overlord import overlord_module
 
 class copyfile(overlord_module.BaseModule):
+    def checksum(self, thing):
+        CHUNK=2**16
+        thissum = hashlib.new('sha1')
+        if os.path.exists(thing):
+            fo = open(thing, 'r')
+            while True:
+                chunk = fo.read(CHUNK)
+		if not chunk:
+		    break
+                thissum.update(chunk)
+            fo.close()
+            del fo
+        else:
+            # assuming it's a string of some kind
+            thissum.update(thing)
+
+        hexdig = thissum.hexdigest()
+        return hexdig
+
     def send(self, localpath, remotepath, bufsize=60000):
         try:
             f = open(localpath, "r")
@@ -18,13 +49,14 @@ class copyfile(overlord_module.BaseModul
         mode = stat.S_IMODE(st.st_mode)
         uid = st.st_uid
         gid = st.st_gid
+	local_sum = self.checksum(localpath)
 
-        self.parent.run("copyfile", "open", [remotepath, mode, uid, gid])
+        open_result = self.parent.run("copyfile", "open", [remotepath, local_sum, mode, uid, gid])
 
         while True:
             data=f.read(bufsize)
             if data:
-                self.parent.run("copyfile", "append", [remotepath, xmlrpclib.Binary(data)])
+                self.parent.run("copyfile", "append", [open_result, xmlrpclib.Binary(data)])
             else:
                 break
 
_______________________________________________
Func-list mailing list
Func-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/func-list