[Yum] Survey of Use

rgb at phy.duke.edu (Robert G. Brown) · Fri Oct 24 12:44:25 2003

On Fri, 24 Oct 2003, Hedemark, Magnus wrote:

> RGB said:
> 
> > There are probably ways to cut this down in an emergency (a serious
> > exploit, for example, that needs to be corked in 36 hours or 
> > less at the
> > client level).  
> 
> Have the primary site push/initiate the rsync to the T1, and the T1 initiate
> the rsync to T2 in an emergency.  T3's are on their own to fetch from T2's.
> That should have an incredible boost in how quickly packages get to the T3's
> and then to the end node.

Ah, but this violates the first principle of scalable distribution --
client pull, never server push.  Pulling is a client initiated action
and requires no remote privileges (which you likely wouldn't get) and
can be authenticated or not as the provider wishes.  Pushing requires at
least SOME measure of authentication and access on the clients.

You are right in that there are lots of ways, though, to improve the
downstream distribution rate should it need to be improved; the bigger
question is how to do it with minimum work and minimum (required) trust.
No matter what, the clients have to trust the primary server, but in a
client pull model they can rescind that trust by simply not pulling.

> Of course if the T1's & T2's are set up properly, would it really hurt that
> much to do an rsync pull up the chain every hour or two?  Assuming that the
> deltas are pretty small, that should be a pretty low overhead transaction,
> no?

99% of the time yes, 1% of the time no.  Or something like that.

There are incremental updates, typically a few packages, which are
relatively common, and full upgrades or major multipackage updates,
which are relatively rare and (given a mailing list) could often be
deliberately scheduled.

However, you are dead right that one could adjust the granularity of the
updates to propagate down the mirror chain faster, as long as one feeds
back the server load into the process of adjustment so that one doesn't
crush a mirror underfoot by going TOO fast.  This probably isn't worth
automating with actual load balancing, because it just isn't that
important in general (yet) to update at the finest POSSIBLE granularity.

Yum is typically scheduled to update end-stage clients via a client pull
sometime overnight, so the real question is whether one can deliver to
the repositories that serve them in time to catch the first, the second,
the third night's automagic update. No lag at all would always catch the
first night, a 12 hour lag would always catch the first or the second
night; a 24 hour lag would always catch the second night; a 36 hour lag
the second or third, etc..  Depending on the scheduling and numbers of
T1-T3 servers and size of the updates, one could almost certainly reduce
the T3-T4(LAN) propagation time enough to catch the first-or-second
night window.  But this can be worked out in practice.

It is nearly miraculous either way.  Drop an updated RPM into the
primary repository at noon on Monday and by Wednesday morning at the
latest every yum client system connected to the tree is updated with the
new RPM.  Scary, almost.  You'd need to defend the primary repository
with dobermans and electric fences and its primary administrator needs
to be somebody with paranoid schizophrenic tendencies and an eye tic.

   rgb

> _______________________________________________
> Yum mailing list
> Yum@xxxxxxxxxxxxxxxxxxxx
> https://lists.dulug.duke.edu/mailman/listinfo/yum
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@xxxxxxxxxxxx