On 1/5/19 1:57 PM, Danishka Navin wrote: > Hi Team, > > I have a requirement of backing up 3 files (each is aound 50M) to a central > server. > There are 750-800 client systems in diffrennt locations (not in same > network) and suppose to backup over internet. > > I am planning to use rsync over ssh. Do be careful with this -- if the files are very different, sftp or scp will be more efficient. Odds are on a 50MB file, you will not see a big difference, but rsync will sometimes really bite you when files change a lot or are totally different. Ask me how I know. :D (off-list) Small difference x 800 might start to add up...and depending on the OS, one less package to worry about. > I need to know how much what resources and what configuration required to > make this all clients backup the content as soon as possible. > I can provide at least 4 servers instead of single central server where we > can split number of connections per server. > > Regards, Are the clients reaching out to the backup server, or the backup server reaching out to the clients? (pushing the backups vs. pulling them). The first bottleneck you will run into if you try to start 800 SSH sessions at the same moment on a modern computer is probably processor. There's a big CPU load when the SSH connection first starts, and I can't imagine any modern computer that would enjoy 800 (or 100) of those hits at the /exact/ same time. This has to be avoided. Once the connection is started, there's some computational power needed there, too, but nothing compared to the initial connection. If pulling the backups to the backup server, your task is really quite simple -- set your script up to start and background the transfers, and keep starting one a second until you hit whatever limit you set for the number of rsyncs/scps (experiement), and don't start another until you drop below that limit. Finally, pkill all rsync sessions after a period of time so dead and failed connections don't start to pile up. If pushing from the clients to the server, you need to stagger the starts. Depending on the nature of your project, you could do a sleep $(($RANDOM % 1000)) at the start of your backup script, or use the IP address to compute a sleep time, or cut a number out of the system name or something along those lines. You don't want to manage 800 different scripts, so cause a delay either by computation or randomization. Do that, and you probably won't have to worry about any other resource and you can do it on one computer. Nick. _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev@xxxxxxxxxxx https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev