On Wed, 2 Feb 2005, Eric S. Raymond wrote: > Robert G. Brown <rgb@xxxxxxxxxxxx>: > > Having a repository redirect to a site selected by ITS > > administrators means that I might never know just where my RPMs are > > coming from. It also gives crackers a clear target and exploit path on > > any repository anywhere in a trusted chain of repositories, where the > > administrator of a local system does not even know (that's the > > "transparent" part) what the structure of the chain is and where it > > might dynamically change overnight at the whim of an intermediate > > repository administrator or cracker that has taken over that system. > > This problem is easily solved. Yum gives users progress messages on > download anyway; these messages should *always* include the source > site of the RPM. > > To go with this, the default behavior should be to stop and asks for > confirmation when a redirect takes you to a repository you have not > previously indicated you trust. (Any repository in your conf is > trusted by definition.) However useful yum is as a hand tool, >>most<< yum invokations on >>most<< systems are the automated nightly updates (I think that's a pretty fair statement). On our campus hundreds of users who are linux newbies get updated to the latest security patches without even knowing it (although it isn't exactly hidden from them, either). EVERYBODY'S systems installed from linux@duke get a nightly update by default unless they explicitly turn it off, which makes our security officer very happy -- nobody stays vulnerable to a patched exploit more than 24 hours after the update is put onto the corresponding campus repo, not even the rankest campus newbie. So sure, interactive behavior I agree with you. Interactive behavior is more or less "always" used on an original install (I mean you can surpress it but most people probably don't), but the concern is automated behavior, even on expertly run systems. If you aren't present to say "yes" to a tell-me-twice, you cannot observe that for some reason the current update is redirected to 208.183.77.111 somewhere in the wilds of Brazil, and by morning when you check the logs it is WAY too late. If you argue that you can put a list of trusted hosts into a config file that excludes this possibility, well then, why have the "automated forwarding" done by the repos in the first place? As you say, the repo list IS the list of trusted hosts -- the whole issue is whether hosts you DON'T have in that list can be added by the hosts you DO have on the list. If you knew ahead of time what they were (and decided that trusted them) you could just add them to the existing repo collection as failover sites or whatever. The only point in having finer grained control is if there are intermediary levels of trust between always trusted and never trusted. I'd argue that there are (and do, below) but that this doesn't require an auxiliary config file containing host trust information but an additional trust variable to be added to the existing repo descriptors. > You speak of the "whim" of an intermediate repository administrator, > but I think repository administrators are far more likely to have an > accurate model of repo dependencies than end users are. Sure, but that's not the point. It might be more convenient -- it's a great idea, even, if it can be made secure and can address other security/trust issues such as gpg signature management, which I actually think is a more significant issue for most newbie users. However, proposed as a transparent forwarding mechanism where the choice of forwarding sites will often never been seen by humans during the nightly update, it really significantly increases your security exposure, especially if you DO use yum to install and occassionally update from a twig-level repository (one with just a few packages run by a private individual or single organzation). You also run into potentially greater problems with mix-n-match dependency loops if the dependency tree in the forwarded sites differ from the assumptions in your install tree up to that point, but that's another story. Please don't misunderstand me -- I do think that there is something that needs to be done here. I myself have been experimenting with the idea of the "mini-repository". I maintain some 3-4 rpm packages that are used here and elsewhere to at least some extent, and it is not at all easy to keep repositories that include them anything like up to date, so I get a slow stream of bug reports from people running old versions. For them to update to the latest version often requires a tgz or src rpm download and rebuild, even though I >>do<< have binary rpm builds that would likely work on their systems. I've been trying to assemble them in a "mini-repository" that contains rpm builds for some 4-6 distribution/CPU arch combos (what I have available here to build on). I've been FANTASIZING about making this a private "yum repository" where the primary install/update mechanism people would use would be to drop the repo data (provided) into their yum setup, and just do e.g. "yum install jove" (one of the packages I've been building since I've been using jove for too many years to quit now:-). What I've built "works" for this, but it is a clunky mess for a user to set up at their end. They have to download the yum repo image corresponding to their distro etc, my gpg key, install the one in the right place, install the other with the right tool, and only THEN can they install jove, or dieharder, or whatever. What is needed here is a way for the toplevel repository metadata to be encapsulated and served/retrieved to automate this to where a non-expert could manage it without needing to know about rpm-import or where or how to drop a repo into /etc/yum.conf or /etc/yum.repos.d/. I'd argue that the kind of automation needed isn't transparency of yum itself, which isn't intended to be transparent but rather functional and deliberate, but an auxiliary yum management interface to encapsulate things like this for the non-expert user. > > > I'd be perfectly happy for a repository to SUGGEST > > possible secondary servers to use in yum's already existing repository > > fallback mechanism -- I just don't want to follow such a chain blind, > > especially offsite and into a different organization that I might not > > trust. > > The mechanism required for the repository to make suggestions is the > same as the mechanism for it required to do transparent redirects, > The only difference would be policy choice in the client. > > You're quite right that following redirects blindly would be bad. The > correct answer is to have the facility, but make sure that following > the redirects isn't blind. > > I think my proposal and your counter just merged. For either proposal, > we need a new piece of per-repository (or perhaps per-channel) metadata, > which is a package redirect list. > > The client (yum) needs to know that it should look at that list when a > repo search fails to find a match. It's a client policy issue how > much checking the client does before chasing the link. Fair enough. Although I think there are also host and rpm authentication issues one might want to wrap into this, given that one is going to the effort in the first place. Spoofing is always an issue -- is yum secure if run across intermediary untrusted networks? At the moment, I'd say the only protection is gpgcheck. Alas, the gpgcheck interface is very clunky, especially for novice users running their own systems, so there is a strong temptation to just turn it off the first time it won't let you install something you want to install. So I personally think that enabling secure (ssl?) retrieval and management of keys is as important as enabling secure retrieval of other repo metadata. > > It would be truly lovely if the tool directly supported the kind of > > usage I just implicitly described -- having a search tool for > > repositories from which I can install e.g. realplayer, a selection tool > > to pick a repository from the list that I can trust or which has e.g. > > SSL credentials I can verify and via which I can obtain trusted gpgcheck > > keys, a tool for manipulating those keys (another nontrivial problem for > > the novice/single system administrator that is likely to have them > > setting "gpgcheck = 0" the first time they encounter a problem, which is > > likely to be the first time they add a twig-level repository), a tool > > for turning on access to that repository for the time required to do a > > single install or update, a tool for turning off access (put preserving > > the repository in the list) when not deliberately installing or updating > > from that site. > > Agreed. However, all this stuff depends on having the right metadata > in the infrastructure first. That's problem one. > > > Sheer numbers > > of these twig repos in yum significantly increases your security risk > > and creates the possibility of some nasty chain reactions if any sort of > > loop topology such as "yum rings" is ever actually established. > > I already thought of this one. Breaking such loops is trivial; you keep > a list of visited repositories and simply don't add a redirect if it's > already present in the list. > > > Compromising any repository in the ring might suffice in short order to > > compromise the entire linked set of rings -- maybe even all > > repositories, anywhere. That would be tough for linux, and yum, to live > > down. > > Yes, but we already have this problem. That is, any repo compromise > could already fuck up a lot of end users badly. Redirects wouldn't > make it worse, unless the redirection chasing is hidden from the user > (which I'm not proposing). But which would be precisely what happens in nearly all (easily 90%+) cases, especially those involving the very novice admins and users you want to protect for the real underlying complexity. I'm a lot less concerned that (for example) Seth would get burned by such a mechanism than (say) my current CPS independent study student, who just put his very first linux installation on his system in the dorms, dual boot. Next to Seth, a raving paranoiac in an institution is only a bit threat-challenged and he reads the log files even while he is asleep to ensure that no Evil is being done. My student is utterly clueless about what or where "/var/log/messages" is, what an RPM "is", what gpg is, what yum is, what a dependency is, etc. The yum setup he has was actually created by Seth et. al. for the general campus community, and is very conservative BECAUSE it has to work transparently for newbies as well as be useful to experts (who can generally take care of themselves). Transparent redirects in automated nightly updates WOULD make the security problem worse, at least worse in a linear proportion to the number of redirect sites outside the line of explicit trust and control of the LAN admins (assuming equal probability of compromise of all of those external sites and their attached failover/dependency branches). > > Lovely idea in the abstract, terrifying in the concrete. > > As I said, I think my proposal and your counter have merged. We can > de-terrify this; all we have to do is inform the user of sources and have a > known_hosts equivalent that stops the user for a check before diving > into an unknown repository. There are still a few remaining questions. First is whether this should be "a part of yum" or a management layer outside of the basic yum toolbox as it currently exists. This in turn is related to whether or not one ends up deciding that a truly automated forwarding mechanism should be inserted into urlgrabber. Another question is whether or not it is time for differential levels of trust. I think that it is, since I use differential trust and yum by hand already, and suspect lots of other people do as well. Finally is the always relevant question of who will do the work -- it is easy to come up with ideas for useful software (especially useful software that you want other people to write:-), but having the time and resources to develop it yourself isn't so easy. I myself am pretty peaked out, even with a student working on selected GPL projects I'm interested in. I personally would argue that this should be done almost entirely outside of yum itself as a separate GUI/Gtk-based tool, run as root, probably even launchable from the System Tools menu with the little root auth popup so novice users can discover and learn the interface sans documentation. Features/specifications: * A metadata server on toplevel distro repositories. Metadata itself should be XML (primarily to get a clean encapsulation and extensibility without breaking during development) and recursive (so that branch repositories can put up their own upstream/downstream metadata). * When first run, the tool contacts the toplevel/distro repository suitably flagged as such in yum.conf and recursively parses the metadata tree, identifying loops, compressing or flagging redundancies (as even without loops a repository might appear as a twig or branch on multiple branches) and such and (to the extent possible) validating the dependency network. At a guess, it will have to build a fairly complex data structure to manage this, and will probably want to save it in /var/cache/yum somewhere and subsequently operate on diffs/changes only. * It presents the results to the user as a tree -- a visualization of the relationships you've posited in replies to Seth in other replies (I started this reply yesterday but was interrupted, so I'm referring to stuff in the future of this note). The tree should almost certainly have a single primary root associated with the actual distro or a faithful mirror thereof. * The user should then be able to go through the tree and take various selected actions on the branches. Prune (branch and all decendant branches). Include (for all yum activities including nightly updates -- for trusted repositories). Select (for a current/immediate yum command, likely an install or update of a particular package on that branch or twig). Save (current picture of the active tree to /etc yum files). Manage Perms (set gpgcheck, automatically retrieve/install gpg keys from https drop for branch and/or attached twigs, trust/untrust, set roundrobin or failover or whatever). * And while one is going to all this trouble for the back end configuration stuff, it seems reasonable to add a GUI encapsulation of the front end commands as well, so one can open the interface, select (but not include) the realplayer site on a suitable branch, run a yum install, deselect it or just not save the selection permanently, and exit. This requires NO modification of yum's base code, AFAICT, and permits sysadmins to be able to choose (e.g.) whether or not to even include the tool. I do think that it would be worthwhile to consider a single modification to yum itself: a differential trust variable. Right now there is an implicit binary distinction of trust: trusted (in yum.conf or /etc/yum.repos/) or not trusted (not there:-). When I go to e.g. the dag repo to grab/install an xmms skins rpm or an xmms plugin to play some specific kind of music file, I DON'T permanently trust that site and don't want to have to worry about whether it contains some package that is newer than something I'm running, but broken or inconsistent that will overwrite my perfectly functional and stable setup during the nightly update. I want to be able to turn it on, run a single install command (attended) and then turn it off, without having to create a repo descriptor (by hand), put it in the proper place, run yum, REMOVE the repo descriptor or comment it all out, and then return to normal operation. Of course I >>can<< do all of that, and do, but even the thought of describing how to do it to a newbie gives me a headache. Especially if one wants to actually gpgcheck the dag repository files. If one adds a single differential trust variable to the repo descriptors that basically corresponds to the include/select destinction above so that the saved repo data has it set (by the GUI or by hand), one could add a single command line option such as "-s" instructing yum to use both repositories from the included (default) list and the selected (optional/transient) list for the current command only. This would simplify yum management, whether by some future imaginary GUI or by hand. This is consistent with yum's design philosophy, which is to ALWAYS do the conservative thing by default, and to only do wild-n-crazy stuff like descend recursed repository lists interactively and when explicitly told to (and thence both at your own risk and with you THERE to watch the results and see if you believe them). If you DO convince Seth et. al. that recursed lists are a good thing, they'd almost certainly be a good thing only when run interactively and/or with the -s option. Either way a lot of sites will want to NOT ever permit the use of this sort of thing on user workstations. If you are a single admin with several hundred workstations to manage, you're going to want to be totally fascist in your yum update and install policy and will absolutely not want any sort of interactive per-system install of stuff you haven't retrieved as a source rpm and rebuilt for a local repository. However, individual users will obviously find it to be very useful, and even paranoid admins will find having the features useful because, run on their development workstation, it does a lot of nasty work for them that they now have to do by hand while putting together a coherent set of rpm's for their local repo set. As for who will build it -- almost certainly not me, although it does look like it would be fun. I'm just having too much fun already...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@xxxxxxxxxxxx