On 26 Jun 2002, seth vidal wrote: > > Have it be a little more robust server-wise. So that if there are multiple > > servers, and one of them doesn't happen to be available, it just ignores that > > server and uses the other in the list. > > Why? > > Because it would be nice to have mirrored servers, and have them both in the > > config file. If one of the servers goes down, then the other one just get's > > all of the traffic. > > I thought maybe something like this: > > [serverid] > name=my cool server > baseurl=url://mydefault/path/url > gpgcheck=0 > mirror1=url://mirror1/path/url > mirror2=url://mirror2/path/url > > etc etc - up to lets say 10 mirrors I'd guess - to be reasonable. It would be more symmetric to just have server0=url://server0/path/url server1=url://server0/path/url server2=url://server0/path/url ... More descriptive, too, as server1 may not be a mirror but a different server altogether, although one hopes that the requested RPM's and groups are in the intersection of what the servers provide. You might even have it work its way down the servers, in order, until it finds all the files it is trying to update. So if server0 can do a full update, fine, but if not it tries server1 for any server0 was missing. > problems with this: > > how do we tell if a url is really down? You can't. There are a few things you can do -- pinging the host, for example, before trying the URL. However, the nature of TCP is that certain classes of network failure are indistinguishable from a host or port being down. The best you can do is wait for a TCP timeout on the socket OR parse any error message returned from the socket if you get it. The ping is primarily to avoid the wait for a TCP timeout if the host won't ping. > I know how to tell if its bad, but down? I'm not sure what this means. A "bad url" could mean: a) The hostname doesn't resolve. This can be tested in code and an error returned. b) The host resolves but is down (EITHER no path to the host OR the host is actually down -- the network stack can't really tell). The ping test helps with this -- consider the host "down" if it doesn't ping. Or skip a ping test and just try to connect to the port. c) The host resolves, is up, but the port (specified or defaulted) in the URL isn't open. "Not open" even means many things -- the host may have exceeded its limit of open connections on that port (sometimes it will be nice enough to tell you this) or it may have no daemon there or there may be a daemon there but a guardian process like iptables rejects the connection. Here you are in a quandry -- if you try to open a socket and there is no daemon at all listening, you MUST wait for a TCP timeout before you can proceed, and you won't necessarily be able to tell why you failed to connect if you fail to connect. c') If SOME daemon is listening, but you are e.g. rejected, you may get a (non-apache) message telling you why. Hard to handle all the possibilities -- probably should just log the message for a human to consider and move on. d) The host resolves, is up, the port is open, but when you connect you get any of the many errors associated with a bad path -- no permissions, a bad path, etc. Again, log the message for a human to handle (fix those path typos...) > we need to keep from having it hang when figuring out what url to use. > > any ideas on that? The outline above ought to work decently. a) Test hostname/ip resolution b) if it resolves, optionally ping the host and c) if it pings, try to connect to the URL/port and d) handle a tcp timeout by either retrying or moving on to the next host and returning to a) e) handle a connection followed by any error by parsing a small set of "predictable" errors and handling them if possible, f) logging ALL errors for humans to take note of and g) if the error is a "fatal" error (as in, sorry, don't know what to do now) move on to the next host and return to a) h) until there are no hosts left OR no more files to install. Finally, if you run out of hosts with required RPM's still not found, handle it as an error -- sometimes is might be safe to proceed (the missing RPM is something like PVM -- useful and desired, but not critical to a post-boot system and can be added in later) and others it might be something really important (like a kernel or glibc:-). Again, humans will likely have to judge which is which, although some yum users may be ill-equipped for the judgement. rgb > > -sv > > > _______________________________________________ > Yum mailing list > Yum@xxxxxxxxxxxxxxxxxxxx > https://lists.dulug.duke.edu/mailman/listinfo/yum > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@xxxxxxxxxxxx