[Yum] other stuff

rgb@xxxxxxxxxxxx (Robert G. Brown) · Wed, 26 Jun 2002 18:34:24 -0400 (EDT)

On 26 Jun 2002, seth vidal wrote:

> > Have it be a little more robust server-wise.  So that if there are multiple 
> > servers, and one of them doesn't happen to be available, it just ignores that 
> > server and uses the other in the list.
> > Why?
> > Because it would be nice to have mirrored servers, and have them both in the 
> > config file.  If one of the servers goes down, then the other one just get's 
> > all of the traffic.
> 
> I thought maybe something like this:
> 
> [serverid]
> name=my cool server
> baseurl=url://mydefault/path/url
> gpgcheck=0
> mirror1=url://mirror1/path/url
> mirror2=url://mirror2/path/url
> 
> etc etc - up to lets say 10 mirrors I'd guess  - to be reasonable.

It would be more symmetric to just have

server0=url://server0/path/url
server1=url://server0/path/url
server2=url://server0/path/url
...

More descriptive, too, as server1 may not be a mirror but a different
server altogether, although one hopes that the requested RPM's and
groups are in the intersection of what the servers provide.

You might even have it work its way down the servers, in order, until it
finds all the files it is trying to update.  So if server0 can do a full
update, fine, but if not it tries server1 for any server0 was missing.

> problems with this:
> 
> how do we tell if a url is really down?

You can't.  There are a few things you can do -- pinging the host, for
example, before trying the URL.  However, the nature of TCP is that
certain classes of network failure are indistinguishable from a host or
port being down.  The best you can do is wait for a TCP timeout on the
socket OR parse any error message returned from the socket if you get
it.  The ping is primarily to avoid the wait for a TCP timeout if the
host won't ping.

> I know how to tell if its bad, but down?

I'm not sure what this means.  A "bad url" could mean:

  a) The hostname doesn't resolve.  This can be tested in code and an
error returned.
  b) The host resolves but is down (EITHER no path to the host OR the
host is actually down -- the network stack can't really tell).  The ping
test helps with this -- consider the host "down" if it doesn't ping.  Or
skip a ping test and just try to connect to the port.
  c) The host resolves, is up, but the port (specified or defaulted) in
the URL isn't open.  "Not open" even means many things -- the host may
have exceeded its limit of open connections on that port (sometimes it
will be nice enough to tell you this) or it may have no daemon there or
there may be a daemon there but a guardian process like iptables rejects
the connection.  Here you are in a quandry -- if you try to open a
socket and there is no daemon at all listening, you MUST wait for a TCP
timeout before you can proceed, and you won't necessarily be able to
tell why you failed to connect if you fail to connect.  
  c') If SOME daemon is listening, but you are e.g. rejected, you may
get a (non-apache) message telling you why.  Hard to handle all the
possibilities -- probably should just log the message for a human to
consider and move on.
  d) The host resolves, is up, the port is open, but when you connect
you get any of the many errors associated with a bad path -- no
permissions, a bad path, etc.  Again, log the message for a human to
handle (fix those path typos...)

> we need to keep from having it hang when figuring out what url to use.
> 
> any ideas on that?

The outline above ought to work decently.

  a) Test hostname/ip resolution
  b) if it resolves, optionally ping the host and
  c) if it pings, try to connect to the URL/port and
  d) handle a tcp timeout by either retrying or moving on to the next
host and returning to a)
  e) handle a connection followed by any error by parsing a small set of
"predictable" errors and handling them if possible,
  f) logging ALL errors for humans to take note of and
  g) if the error is a "fatal" error (as in, sorry, don't know what to
do now) move on to the next host and return to a)
  h) until there are no hosts left OR no more files to install.

  Finally, if you run out of hosts with required RPM's still not found,
handle it as an error -- sometimes is might be safe to proceed (the
missing RPM is something like PVM -- useful and desired, but not
critical to a post-boot system and can be added in later) and others it
might be something really important (like a kernel or glibc:-).  Again,
humans will likely have to judge which is which, although some yum users
may be ill-equipped for the judgement.

    rgb

> 
> -sv
> 
> 
> _______________________________________________
> Yum mailing list
> Yum@xxxxxxxxxxxxxxxxxxxx
> https://lists.dulug.duke.edu/mailman/listinfo/yum
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@xxxxxxxxxxxx