On Fri, 24 Oct 2003, Hedemark, Magnus wrote: > RGB said: > > > There are probably ways to cut this down in an emergency (a serious > > exploit, for example, that needs to be corked in 36 hours or > > less at the > > client level). > > Have the primary site push/initiate the rsync to the T1, and the T1 initiate > the rsync to T2 in an emergency. T3's are on their own to fetch from T2's. > That should have an incredible boost in how quickly packages get to the T3's > and then to the end node. Ah, but this violates the first principle of scalable distribution -- client pull, never server push. Pulling is a client initiated action and requires no remote privileges (which you likely wouldn't get) and can be authenticated or not as the provider wishes. Pushing requires at least SOME measure of authentication and access on the clients. You are right in that there are lots of ways, though, to improve the downstream distribution rate should it need to be improved; the bigger question is how to do it with minimum work and minimum (required) trust. No matter what, the clients have to trust the primary server, but in a client pull model they can rescind that trust by simply not pulling. > Of course if the T1's & T2's are set up properly, would it really hurt that > much to do an rsync pull up the chain every hour or two? Assuming that the > deltas are pretty small, that should be a pretty low overhead transaction, > no? 99% of the time yes, 1% of the time no. Or something like that. There are incremental updates, typically a few packages, which are relatively common, and full upgrades or major multipackage updates, which are relatively rare and (given a mailing list) could often be deliberately scheduled. However, you are dead right that one could adjust the granularity of the updates to propagate down the mirror chain faster, as long as one feeds back the server load into the process of adjustment so that one doesn't crush a mirror underfoot by going TOO fast. This probably isn't worth automating with actual load balancing, because it just isn't that important in general (yet) to update at the finest POSSIBLE granularity. Yum is typically scheduled to update end-stage clients via a client pull sometime overnight, so the real question is whether one can deliver to the repositories that serve them in time to catch the first, the second, the third night's automagic update. No lag at all would always catch the first night, a 12 hour lag would always catch the first or the second night; a 24 hour lag would always catch the second night; a 36 hour lag the second or third, etc.. Depending on the scheduling and numbers of T1-T3 servers and size of the updates, one could almost certainly reduce the T3-T4(LAN) propagation time enough to catch the first-or-second night window. But this can be worked out in practice. It is nearly miraculous either way. Drop an updated RPM into the primary repository at noon on Monday and by Wednesday morning at the latest every yum client system connected to the tree is updated with the new RPM. Scary, almost. You'd need to defend the primary repository with dobermans and electric fences and its primary administrator needs to be somebody with paranoid schizophrenic tendencies and an eye tic. rgb > _______________________________________________ > Yum mailing list > Yum@xxxxxxxxxxxxxxxxxxxx > https://lists.dulug.duke.edu/mailman/listinfo/yum > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@xxxxxxxxxxxx