I have added checksumming to the current "chunked" implementation of copyfile. If the source file on the Overlord and target file on the Minion have the same checksum, the target is not overwritten. My understanding is that copyfile originally calculated a checksum for the source and target files and only overwrote the target if the checksums differed. The "copyfile" method of the CopyFile object on the Minion side handled this. Currently copyfile always overwrites the target file. It appears that when copyfile was modified to write files in chunks (avoiding out-of-memory situations when copying large files) this feature vanished. Instead of calling a single "copyfile" method there are two methods: "open" to truncate (or create) the target file and "append" to keep adding data to it. Both of these methods are always applied. I have changed the "open" method so that it compares the checksums of the source and target. It just returns zero and does not truncate the target file if the checksums match. If they do not match it still truncates the target and returns the file path, as before. I have changed the "append" method to look up the return code from "open" each time it is called. If the return code was zero or any number instead of the file path, it does not append anything to the target file. Note that "append" is still called repeatedly even on a Minion which is not going to do any appending to that particular file. This is inefficient but it works. There should be a better way of adding the checksumming code to both the Overlord and Minion copyfile.py files. I'll look into that next. Also, it should now be relatively easy to re-implement the --force argument (skip checksumming) and to create a new --backup argument (backup the target before overwriting like the original copyfile method did). Based on the experience of writing this code I would like to make a feature suggestion. It would be useful if the "run" method in the "Overlord" class could take an optional "mask" argument which could instruct it to skip certain Minions. The mask could be a dictionary with minion names as the keys and something like True/False as the data. Coincidentally this is almost exactly the same structure which is returned by the "run" method in the "Overlord" class itself. This feature would make it easy and efficient to write functions which have multiple "run" calls which are dependent upon one another. -- Marcus
diff -rupN func/func//minion/modules/copyfile.py func.new/func//minion/modules/copyfile.py --- func/func//minion/modules/copyfile.py 2010-11-25 20:16:50.000000000 -0500 +++ func.new/func//minion/modules/copyfile.py 2010-11-25 19:55:28.000000000 -0500 @@ -26,6 +26,7 @@ import time import shutil import func_module +from func import utils as func_utils class CopyFile(func_module.FuncModule): @@ -40,28 +41,35 @@ class CopyFile(func_module.FuncModule): return thissum.hexdigest() def checksum(self, thing): - CHUNK=2**16 thissum = hashlib.new('sha1') if os.path.exists(thing): - fo = open(thing, 'r', CHUNK) - chunk = fo.read - while chunk: + fo = open(thing, 'r') + while True: chunk = fo.read(CHUNK) + if not chunk: + break thissum.update(chunk) fo.close() del fo else: - # assuming it's a string of some kind + # assuming it's a string of some kind thissum.update(thing) - return thissum.hexdigest() + hexdig = thissum.hexdigest() + return hexdig - def open(self, filepath, mode=None, uid=-1, gid=-1): + def open(self, filepath, remote_sum, mode=None, uid=-1, gid=-1): dirpath = os.path.dirname(filepath) if not os.path.exists(dirpath): os.makedirs(dirpath) + if os.path.exists(filepath): + local_sum = self.checksum(filepath) + + if remote_sum == local_sum: + return 0 + # Create empty file try: fo = open(filepath, 'w') @@ -82,7 +90,14 @@ class CopyFile(func_module.FuncModule): return filepath - def append(self, filepath, filebuf): + def append(self, open_result, filebuf): + hostname = func_utils.get_hostname_by_route() + filepath = open_result[hostname] + + # If the result of the open function was 0 (or -1), do not append, return the same value. + if type(filepath) == type(0): + return filepath + if not os.path.exists(filepath): # file disaperead return -1 diff -rupN func/func//overlord/modules/copyfile.py func.new/func//overlord/modules/copyfile.py --- func/func//overlord/modules/copyfile.py 2010-11-25 20:16:50.000000000 -0500 +++ func.new/func//overlord/modules/copyfile.py 2010-11-25 19:42:51.000000000 -0500 @@ -4,9 +4,40 @@ import stat import sys import xmlrpclib +try: + import hashlib +except ImportError: + # Python-2.4.z ... gah! (or even 2.3!) + import sha + class hashlib: + @staticmethod + def new(algo): + if algo == 'sha1': + return sha.new() + raise ValueError, "Bad checksum type" + from func.overlord import overlord_module class copyfile(overlord_module.BaseModule): + def checksum(self, thing): + CHUNK=2**16 + thissum = hashlib.new('sha1') + if os.path.exists(thing): + fo = open(thing, 'r') + while True: + chunk = fo.read(CHUNK) + if not chunk: + break + thissum.update(chunk) + fo.close() + del fo + else: + # assuming it's a string of some kind + thissum.update(thing) + + hexdig = thissum.hexdigest() + return hexdig + def send(self, localpath, remotepath, bufsize=60000): try: f = open(localpath, "r") @@ -18,13 +49,14 @@ class copyfile(overlord_module.BaseModul mode = stat.S_IMODE(st.st_mode) uid = st.st_uid gid = st.st_gid + local_sum = self.checksum(localpath) - self.parent.run("copyfile", "open", [remotepath, mode, uid, gid]) + open_result = self.parent.run("copyfile", "open", [remotepath, local_sum, mode, uid, gid]) while True: data=f.read(bufsize) if data: - self.parent.run("copyfile", "append", [remotepath, xmlrpclib.Binary(data)]) + self.parent.run("copyfile", "append", [open_result, xmlrpclib.Binary(data)]) else: break
_______________________________________________ Func-list mailing list Func-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/func-list