Re: Fastest way to set files date and time to latest commit time of each one

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  Hello everyone!
  I just now managed to get time to work again on this, sorry for
replying so late but wanted to do a single reply with the conclusion
if possible.
  Let me tell you that I feel very humbled by all your replies, thanks
a lot for your time and concern with my inquiry!
  Eric's script is not only in Debian but also in CentOS 7 (and I
guess Red Hat 7 too) in
/usr/share/doc/rsync-*/support/git-set-file-times.
  My use case is similar to his: a cluster of identically configured
web servers with autoscaling (tested up to 100 servers) which when
they boot (or there is a new version of any of the websites), rsync
the current version from another server.
  So currently if we build the system image of the web servers in
tandem with the central server everything works smoothly, the problem
is when we recreate from scratch any of the pre-saved images, in which
case we get the dates mismatch and unnecessary rsync checksumming when
put to production.
  Will use Eric's script from CentOS 7 as-is from now on, to avoid the
mismatch and mix pre-saved VM images without issues (slowness in
autoscaling).
  Thanks a lot to you all!
  Let me know if any of you comes to Uruguay, you got free beers here!
  Have a great day.


El sáb., 29 de ago. de 2020 a la(s) 01:48, Eric Wong (e@xxxxxxxx) escribió:
>
> Ivan Baldo <ibaldo@xxxxxxxxx> wrote:
> >   Hello.
> >   I know this is not standard usage of git, but I need a way to have
> > more stable dates and times in the files in order to avoid rsync
> > checksumming.
> >   So I found this
> > https://stackoverflow.com/questions/2179722/checking-out-old-file-with-original-create-modified-timestamps/2179876#2179876
> > and modified it a bit to run in CentOS 7:
> >
> > IFS="
> > "
> > for FILE in $(git ls-files -z | tr '\0' '\n')
> > do
> >     TIME=$(git log --pretty=format:%cd -n 1 --date=iso -- "$FILE")
> >     touch -c -m -d "$TIME" "$FILE"
> > done
> >
> >   Unfortunately it takes ages for a 84k files repo.
> >   I see the CPU usage is dominated by the git log command.
>
> running git log for each file isn't necessary.
>
> On Debian, rsync actually ships the `git-set-file-times' script
> in /usr/share/doc/rsync/scripts/ which only runs `git log' once
> and parses it.
>
> You can also get my (original) version from:
> https://yhbt.net/git-set-file-times
>
> >   I know a way I could use to split the work for all the CPU threads
> > but anyway, I would like to know if you guys and girls know of a
> > faster way to do this.
>
> Much of your overhead is going to be from process spawning.
> My Perl version reduces that significantly.
>
> I haven't tried it with 84K files, but it'll have to keep all
> those filenames in memory.  I'm not sure if parallelizing
> utime() syscalls is worth it, either; maybe it helps on SSD
> more than HDD.



-- 
Ivan Baldo - ibaldo@xxxxxxxxx - http://ibaldo.codigolibre.net/
Freelance C++/PHP programmer and GNU/Linux systems administrator.
The sky isn't the limit!




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux