Hello everyone! I just now managed to get time to work again on this, sorry for replying so late but wanted to do a single reply with the conclusion if possible. Let me tell you that I feel very humbled by all your replies, thanks a lot for your time and concern with my inquiry! Eric's script is not only in Debian but also in CentOS 7 (and I guess Red Hat 7 too) in /usr/share/doc/rsync-*/support/git-set-file-times. My use case is similar to his: a cluster of identically configured web servers with autoscaling (tested up to 100 servers) which when they boot (or there is a new version of any of the websites), rsync the current version from another server. So currently if we build the system image of the web servers in tandem with the central server everything works smoothly, the problem is when we recreate from scratch any of the pre-saved images, in which case we get the dates mismatch and unnecessary rsync checksumming when put to production. Will use Eric's script from CentOS 7 as-is from now on, to avoid the mismatch and mix pre-saved VM images without issues (slowness in autoscaling). Thanks a lot to you all! Let me know if any of you comes to Uruguay, you got free beers here! Have a great day. El sáb., 29 de ago. de 2020 a la(s) 01:48, Eric Wong (e@xxxxxxxx) escribió: > > Ivan Baldo <ibaldo@xxxxxxxxx> wrote: > > Hello. > > I know this is not standard usage of git, but I need a way to have > > more stable dates and times in the files in order to avoid rsync > > checksumming. > > So I found this > > https://stackoverflow.com/questions/2179722/checking-out-old-file-with-original-create-modified-timestamps/2179876#2179876 > > and modified it a bit to run in CentOS 7: > > > > IFS=" > > " > > for FILE in $(git ls-files -z | tr '\0' '\n') > > do > > TIME=$(git log --pretty=format:%cd -n 1 --date=iso -- "$FILE") > > touch -c -m -d "$TIME" "$FILE" > > done > > > > Unfortunately it takes ages for a 84k files repo. > > I see the CPU usage is dominated by the git log command. > > running git log for each file isn't necessary. > > On Debian, rsync actually ships the `git-set-file-times' script > in /usr/share/doc/rsync/scripts/ which only runs `git log' once > and parses it. > > You can also get my (original) version from: > https://yhbt.net/git-set-file-times > > > I know a way I could use to split the work for all the CPU threads > > but anyway, I would like to know if you guys and girls know of a > > faster way to do this. > > Much of your overhead is going to be from process spawning. > My Perl version reduces that significantly. > > I haven't tried it with 84K files, but it'll have to keep all > those filenames in memory. I'm not sure if parallelizing > utime() syscalls is worth it, either; maybe it helps on SSD > more than HDD. -- Ivan Baldo - ibaldo@xxxxxxxxx - http://ibaldo.codigolibre.net/ Freelance C++/PHP programmer and GNU/Linux systems administrator. The sky isn't the limit!