Attached are my original patch and a git-patch which contains the same
code. These implement changes I called #1 in my previous e-mail.
#2 doesn't seem too hard. CopyFile.open will have to open a tempfile
rather than the target file. It already returns the filepath on success.
Append already uses the return value of CopyFile.open as the file to append
to. After appending has finished there needs to be a CopyFile.move function
called by the overlord which takes both paths and moves the tempfile over the
source file. Making sure the ownership/permissions/SELinux attributes of the
target file do not change could be is important so had better be careful about
that.
Once this is in place #3 is almost trivial. The overlord needs to
calculate the MD5 of the source file. The code for doing this is also needed
by #1 and is included in my patches. That MD5 should be passed to the
CopyFile.move function. It should calculate the MD5 of the tempfile and if
they don't match just delete the tempfile and return an error instead of
moving it. I figure that this MD5 check should be optional so I will also
create a flag to enable it.
Can anyone see any issues with this general design? If not I'll get
working on it when I have the time.
--
Marcus
On Friday, August 24, 2012 07:32:14 PM you wrote:
> Copyfile could most certainly use some TLC, so let me know if I can help,
> that would certainly be of help if you'd like to work on that. The list
> has indeed changed to lists.fedorahosted.com, so definitely use that one.
>
> Would you mind shooting me your original patch as well? Thanks very much!
>
> On Fri, Aug 24, 2012 at 7:22 PM, Marcus Lauer <marcuslauer@xxxxxxxxxx>wrote:
> > It seems to me that there are at least three
> > improvements which
> >
> > should be made to copyfile:
> >
> > 1. If the file on the overlord and minion are the same, that file should
> > not
> > be copied. This saves bandwidth/processing time.
> >
> > 2. Overwriting the file on the minion should be atomic, e.g. the file on
> > the
> > overlord should be copied to a tempfile on the minion then moved to
> > overwrite
> > the target file as a final step. Right now if funcd dies or something
> > goes wrong mid-copy you end up with half a file on the minion.
> >
> > 3. There should be an option (off by default) such that if the file
> > which
> > has
> > been copied over to the minion is not identical to the file on the
> > overlord it
> > is re-copied, or at least does not overwrite the file on the minion.
> > This won't work for some files (busy logfiles) but for most files it
> > seems like an
> > extra bit of reliability which most people would want in most
> > situations.
> >
> > I submitted a patch (though not a git-patch... my bad!)
> > back in
> >
> > January 2011 which implements #1, but imperfectly. The overlord still
> > has to
> > send a (small) packet to the minion for each 60KB chunk of target file.
> >
> > Would anyone be bothered terribly if I worked on #2 and
> > #3? While
> >
> > I am
> > at it I will try to find a way to fix up my patch for #1 as well. There
> > is some overlap between these patches. Both will involve calculating
> > an MD5 hash
> > of the file on the overlord, for example.
> >
> > P.S. Is the new mailing list on fedorahosted.org live
> > yet? I am
> >
> > sending this message to both just in case.
> >
> > --
> > Marcus
> > _______________________________________________
> > func mailing list
> > func@xxxxxxxxxxxxxxxxxxxxxx
> > https://lists.fedorahosted.org/mailman/listinfo/func
>
> --
> Steve Salevan
> steve@xxxxxxxxxx
diff -rupN func/minion/modules/copyfile.py func.new/minion/modules/copyfile.py
--- func/minion/modules/copyfile.py 2011-01-01 11:45:52.000000000 -0500
+++ func.new/minion/modules/copyfile.py 2011-01-01 11:48:11.000000000 -0500
@@ -26,6 +26,7 @@ import time
import shutil
import func_module
+from func import utils as func_utils
class CopyFile(func_module.FuncModule):
@@ -40,28 +41,39 @@ class CopyFile(func_module.FuncModule):
return thissum.hexdigest()
def checksum(self, thing):
-
CHUNK=2**16
thissum = hashlib.new('sha1')
if os.path.exists(thing):
- fo = open(thing, 'r', CHUNK)
- chunk = fo.read
- while chunk:
+ fo = open(thing, 'r')
+ while True:
chunk = fo.read(CHUNK)
+ if not chunk:
+ break
thissum.update(chunk)
fo.close()
del fo
else:
- # assuming it's a string of some kind
+ # assuming it's a string of some kind
thissum.update(thing)
- return thissum.hexdigest()
+ hexdig = thissum.hexdigest()
+ return hexdig
- def open(self, filepath, mode=None, uid=-1, gid=-1):
+ def open(self, filepath, remote_sum, mode=None, uid=-1, gid=-1, backup=False, force=False):
dirpath = os.path.dirname(filepath)
if not os.path.exists(dirpath):
os.makedirs(dirpath)
+ if os.path.exists(filepath):
+ if not force:
+ local_sum = self.checksum(filepath)
+ if remote_sum == local_sum:
+ return 0
+
+ if backup:
+ if not self._backuplocal(filepath):
+ return -1
+
# Create empty file
try:
fo = open(filepath, 'w')
@@ -82,7 +94,14 @@ class CopyFile(func_module.FuncModule):
return filepath
- def append(self, filepath, filebuf):
+ def append(self, open_result, filebuf):
+ hostname = func_utils.get_hostname_by_route()
+ filepath = open_result[hostname]
+
+ # If the result of the open function was 0 (or -1), do not append, return the same value.
+ if type(filepath) == type(0):
+ return filepath
+
if not os.path.exists(filepath):
# file disaperead
return -1
diff -rupN func/overlord/cmd_modules/copyfile.py func.new/overlord/cmd_modules/copyfile.py
--- func/overlord/cmd_modules/copyfile.py 2011-01-01 11:45:52.000000000 -0500
+++ func.new/overlord/cmd_modules/copyfile.py 2011-01-01 11:48:11.000000000 -0500
@@ -34,6 +34,8 @@ class CopyFile(base_command.BaseCommand)
action="store")
self.parser.add_option("", "--remotepath", dest="remotepath",
action="store")
+ self.parser.add_option("", "--backup", dest="backup",
+ action="store_true")
self.parser.add_option("", "--force", dest="force",
action="store_true")
self.parser.add_option("-v", "--verbose", dest="verbose",
@@ -50,4 +52,4 @@ class CopyFile(base_command.BaseCommand)
self.server_spec = self.parentCommand.server_spec
self.getOverlord()
- return self.overlord_obj.local.copyfile.send(self.options.filename, self.options.remotepath)
+ return self.overlord_obj.local.copyfile.send(self.options.filename, self.options.remotepath, self.options.backup, self.options.force)
diff -rupN func/overlord/modules/copyfile.py func.new/overlord/modules/copyfile.py
--- func/overlord/modules/copyfile.py 2011-01-01 12:08:48.000000000 -0500
+++ func.new/overlord/modules/copyfile.py 2011-01-01 12:10:49.000000000 -0500
@@ -7,7 +7,7 @@ import xmlrpclib
from func.overlord import overlord_module
class copyfile(overlord_module.BaseModule):
- def send(self, localpath, remotepath, bufsize=60000):
+ def send(self, localpath, remotepath, backup=None, force=None, bufsize=60000):
try:
f = open(localpath, "r")
except IOError, e:
@@ -19,12 +19,19 @@ class copyfile(overlord_module.BaseModul
uid = st.st_uid
gid = st.st_gid
- self.parent.run("copyfile", "open", [remotepath, mode, uid, gid])
+ if force:
+ local_sum = -1
+ else:
+ import func.minion.modules.copyfile as CopyFile
+ cf = CopyFile.CopyFile()
+ local_sum = cf.checksum(localpath)
+
+ open_result = self.parent.run("copyfile", "open", [remotepath, local_sum, mode, uid, gid, backup, force])
while True:
data=f.read(bufsize)
if data:
- self.parent.run("copyfile", "append", [remotepath, xmlrpclib.Binary(data)])
+ self.parent.run("copyfile", "append", [open_result, xmlrpclib.Binary(data)])
else:
break
>From b05f6e4f93769b5082bba25d3bcba6c6f6abcd43 Mon Sep 17 00:00:00 2001
From: Marcus Lauer <marcuslauer@xxxxxxxxxx>
Date: Sat, 25 Aug 2012 16:27:13 -0400
Subject: [PATCH] Copyfile should check MD5 of target and source and not copy
if identical
---
func/minion/modules/copyfile.py | 35 +++++++++++++++++++++++++-------
func/overlord/cmd_modules/copyfile.py | 4 ++-
func/overlord/modules/copyfile.py | 13 +++++++++--
3 files changed, 40 insertions(+), 12 deletions(-)
diff --git a/func/minion/modules/copyfile.py b/func/minion/modules/copyfile.py
index 61c6a4f..e5bf5da 100644
--- a/func/minion/modules/copyfile.py
+++ b/func/minion/modules/copyfile.py
@@ -26,6 +26,7 @@ import time
import shutil
import func_module
+from func import utils as func_utils
class CopyFile(func_module.FuncModule):
@@ -40,28 +41,39 @@ class CopyFile(func_module.FuncModule):
return thissum.hexdigest()
def checksum(self, thing):
-
CHUNK=2**16
thissum = hashlib.new('sha1')
if os.path.exists(thing):
- fo = open(thing, 'r', CHUNK)
- chunk = fo.read
- while chunk:
+ fo = open(thing, 'r')
+ while True:
chunk = fo.read(CHUNK)
+ if not chunk:
+ break
thissum.update(chunk)
fo.close()
del fo
else:
- # assuming it's a string of some kind
+ # assuming it's a string of some kind
thissum.update(thing)
- return thissum.hexdigest()
+ hexdig = thissum.hexdigest()
+ return hexdig
- def open(self, filepath, mode=None, uid=-1, gid=-1):
+ def open(self, filepath, remote_sum, mode=None, uid=-1, gid=-1, backup=False, force=False):
dirpath = os.path.dirname(filepath)
if not os.path.exists(dirpath):
os.makedirs(dirpath)
+ if os.path.exists(filepath):
+ if not force:
+ local_sum = self.checksum(filepath)
+ if remote_sum == local_sum:
+ return 0
+
+ if backup:
+ if not self._backuplocal(filepath):
+ return -1
+
# Create empty file
try:
fo = open(filepath, 'w')
@@ -82,7 +94,14 @@ class CopyFile(func_module.FuncModule):
return filepath
- def append(self, filepath, filebuf):
+ def append(self, open_result, filebuf):
+ hostname = func_utils.get_hostname_by_route()
+ filepath = open_result[hostname]
+
+ # If the result of the open function was 0 (or -1), do not append, return the same value.
+ if type(filepath) == type(0):
+ return filepath
+
if not os.path.exists(filepath):
# file disaperead
return -1
diff --git a/func/overlord/cmd_modules/copyfile.py b/func/overlord/cmd_modules/copyfile.py
index 3f2263d..a5dda6c 100644
--- a/func/overlord/cmd_modules/copyfile.py
+++ b/func/overlord/cmd_modules/copyfile.py
@@ -34,6 +34,8 @@ class CopyFile(base_command.BaseCommand):
action="store")
self.parser.add_option("", "--remotepath", dest="remotepath",
action="store")
+ self.parser.add_option("", "--backup", dest="backup",
+ action="store_true")
self.parser.add_option("", "--force", dest="force",
action="store_true")
self.parser.add_option("-v", "--verbose", dest="verbose",
@@ -50,4 +52,4 @@ class CopyFile(base_command.BaseCommand):
self.server_spec = self.parentCommand.server_spec
self.getOverlord()
- return self.overlord_obj.local.copyfile.send(self.options.filename, self.options.remotepath)
+ return self.overlord_obj.local.copyfile.send(self.options.filename, self.options.remotepath, self.options.backup, self.options.force)
diff --git a/func/overlord/modules/copyfile.py b/func/overlord/modules/copyfile.py
index 9f385a9..f13fc9b 100644
--- a/func/overlord/modules/copyfile.py
+++ b/func/overlord/modules/copyfile.py
@@ -7,7 +7,7 @@ import xmlrpclib
from func.overlord import overlord_module
class copyfile(overlord_module.BaseModule):
- def send(self, localpath, remotepath, bufsize=60000):
+ def send(self, localpath, remotepath, backup=None, force=None, bufsize=60000):
try:
f = open(localpath, "r")
except IOError, e:
@@ -19,12 +19,19 @@ class copyfile(overlord_module.BaseModule):
uid = st.st_uid
gid = st.st_gid
- self.parent.run("copyfile", "open", [remotepath, mode, uid, gid])
+ if force:
+ local_sum = -1
+ else:
+ import func.minion.modules.copyfile as CopyFile
+ cf = CopyFile.CopyFile()
+ local_sum = cf.checksum(localpath)
+
+ open_result = self.parent.run("copyfile", "open", [remotepath, local_sum, mode, uid, gid, backup, force])
while True:
data=f.read(bufsize)
if data:
- self.parent.run("copyfile", "append", [remotepath, xmlrpclib.Binary(data)])
+ self.parent.run("copyfile", "append", [open_result, xmlrpclib.Binary(data)])
else:
break
--
1.7.7
_______________________________________________
func mailing list
func@xxxxxxxxxxxxxxxxxxxxxx
https://lists.fedorahosted.org/mailman/listinfo/func