mirror rsync idea

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings. 

With our recent woes about rsync hitting our storage kind of hard, I
had an idea of how we could make things better for at least our tier1
mirrors and anyone who is actually paying attention. 

The only time our master mirrors change is when we push updates (once a
day) or when epel7beta/rawhide finish. When we have branched that adds
another set. 

Right now, mirrors (or any people who are rsyncing content) just hits
the master mirrors N times a day and pulls any changes. Or if they are
very smart (like a few folks on the mirrors list), they grab the
fullfilelist and only sync if it's changed. Which still hits the full
tree when they do. 

I think there's something we can do that might improve things: 

rsync has a ability to make "BATCH MODE" updates. (see 'man rsync' and
the BATCH MODE section). 

- enhance the cron job that syncs updates to generate a batch on each
  push and place it on master mirrors. 

- enhance the rawhide/epel7beta/branched crons to generate a batch file
  for rawhide/epel7beta/branched.

- Mirrors that are otherwise caught up, can just use the batch file for
  that day. If they are more than a day out of sync it won't help them,
  but if they are in sync it will really help them a lot. They can
  ignore all the metadata fetching and files that haven't changed and
  just get the actual things that have. 

- Interested parties can just run their sync to pull the days batch
  file(s). If they don't exist, then no sync has happened yet and they
  can do nothing. If they do, they can download the batch and run it. 

Caveats: 

- This won't help mirrors that do subsets of things unless they tweak
  the batch files (ie, if they exclude debuginfo or something)

- This won't help anyone who doesn't opt in to using it. 

- This won't help anyone who is more than a day out of date, they will
  need to sync up normally first. 

- This may result in a "thundering herd" of syncs after the batch files
  appear. I suspect however it still may be a lot less load than people
  doing full useless syncs. 

Thoughts?

kevin

Attachment: signature.asc
Description: PGP signature

_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure

[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux