Brian Long writes: > Even on 100BaseT or Gigabit LANs inside the same DC, parallel downloads > reduce the time it takes a sysadmin to patch their linux host. When we Impossible, IF the the servers aren't overloaded or incorrectly organized. You have a certain amount of bandwidth into a host and out of the server. If these two are matched, parallel downloads aren't a net win. Only if a) the rpms are split up among multiple servers and; b) the server for one part of the rpm set needed by the hosts is overloaded is there a net reduction in time required to install multiple hosts, and in this case equivalent reductions can generally be obtained by reorganizing the servers so that one server doesn't sit idle while another works. > have farms of linux hosts, reduction in patching time is a huge > productivity gain. Consider the fact that we have RPM'ized Oracle 9i. > While the 1.2GB Oracle server RPM is getting downloaded, it sure would > be nice if 4 other packages were getting downloaded at the same > time. :-) Only if there is idle time on the client where a single server is serving multiple hosts while another server is idle, forming a true bottleneck (wasting server bandwidth). In NO case will an organization that keeps all your servers fully loaded in a serialized install work more slowly. If your task organization is poor, then splitting the client load among N servers (each with the same set of files they can provide) and doing it serialized will always complete the TOTAL job slightly faster than any client-side parallelization. This is important in a LAN (where you control the servers). In a WAN, of course, where the server load and redundancy is beyond your control there can be advantage to parallelization. Remember, the LAN SERVERS are ALREADY de facto parallelized. They provide files to N hosts in parallel if the hosts are connected and requesting files. They use as much as 100% of their bandwidth (less delays due to CPU processing the requests) already. You simply cannot get all the files installed any faster than the servers can provide them, working at full network capacity all the time, and getting this to happen is a matter of task organization, not parallelization of connections especially on the client side. Insisting on client side parallelization masks carelessness in setting up your servers. There are lots of ways to set things up in a LAN environment so that servers are always 100% loaded. Most of them will be MORE efficient than parallelizing the clients, since specifying particular servers for particular package sets means that THOSE servers will be idle for at least part of the install period unless you precisely balance server load (difficult when parallelizing CLIENT SIDE activity). Yum is already server-parallelized to some extent, so that if you set up multiple servers, limit the number of simultaneous connections per server, and have fallback sets of server URLs you should be able to keep a server farm working at (close to) 100% during any sort of multiclient install or update. You cannot do better than 100% utilization on the SERVER side, as this is the fundamental bottleneck. You can easily do worse on the client side. It is easy to lose track of this because of course any given client will appear to be idle waiting for servers at any given time, but if all the servers are running flat out, who cares? You cannot do any better without adding more servers and/or bandwidth. Do you understand what I'm saying, here, or should I state it in more detail? > We are deploying yum repos on load-balanced web servers and we're also > planning to use existing Cisco Content Engines across the globe to cache > our content. Parallel downloads would be very nice in our environment. If the web servers load balance, each has the entire repository set you are installing from, and are always (due to SERVER side parallelization and load balancing) running at peak capacity, you won't be able to install any faster with client-side parallel downloads. It's a matter of simple arithmetic. The best you can do timewise is (bytes to be installed/bytes per second bandwidth/capacity of network). Any scheme that really DOES load balance and keeps your network running at capacity into the clients will yield very much the same amount of time, and parallizing both server side and client side is redundant at best and will actually not work the way you think it will at worst. rgb > > /Brian/ > -- > Brian Long | | | > IT Data Center Systems | .|||. .|||. > Cisco Linux Developer | ..:|||||||:...:|||||||:.. > Phone: (919) 392-7363 | C i s c o S y s t e m s > > _______________________________________________ > Yum mailing list > Yum@xxxxxxxxxxxxxxxxxxxx > https://lists.dulug.duke.edu/mailman/listinfo/yum -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.dulug.duke.edu/pipermail/yum/attachments/20050630/4d55c0f4/attachment.bin