[Yum] Request for comment -- repository-level redirects

rgb at phy.duke.edu (Robert G. Brown) · Thu Feb 3 10:08:29 2005

On Wed, 2 Feb 2005, Eric S. Raymond wrote:

> Robert G. Brown <rgb@xxxxxxxxxxxx>:
> >            Having a repository redirect to a site selected by ITS
> > administrators means that I might never know just where my RPMs are
> > coming from.  It also gives crackers a clear target and exploit path on
> > any repository anywhere in a trusted chain of repositories, where the
> > administrator of a local system does not even know (that's the
> > "transparent" part) what the structure of the chain is and where it
> > might dynamically change overnight at the whim of an intermediate
> > repository administrator or cracker that has taken over that system.
> 
> This problem is easily solved.  Yum gives users progress messages on
> download anyway; these messages should *always* include the source 
> site of the RPM.
> 
> To go with this, the default behavior should be to stop and asks for
> confirmation when a redirect takes you to a repository you have not
> previously indicated you trust.  (Any repository in your conf is
> trusted by definition.)

However useful yum is as a hand tool, >>most<< yum invokations on
>>most<< systems are the automated nightly updates (I think that's a
pretty fair statement).  On our campus hundreds of users who are linux
newbies get updated to the latest security patches without even knowing
it (although it isn't exactly hidden from them, either).  EVERYBODY'S
systems installed from linux@duke get a nightly update by default unless
they explicitly turn it off, which makes our security officer very happy
-- nobody stays vulnerable to a patched exploit more than 24 hours after
the update is put onto the corresponding campus repo, not even the
rankest campus newbie.

So sure, interactive behavior I agree with you.  Interactive behavior is
more or less "always" used on an original install (I mean you can
surpress it but most people probably don't), but the concern is
automated behavior, even on expertly run systems.  If you aren't present
to say "yes" to a tell-me-twice, you cannot observe that for some reason
the current update is redirected to 208.183.77.111 somewhere in the
wilds of Brazil, and by morning when you check the logs it is WAY too
late.

If you argue that you can put a list of trusted hosts into a config file
that excludes this possibility, well then, why have the "automated
forwarding" done by the repos in the first place?  As you say, the repo
list IS the list of trusted hosts -- the whole issue is whether hosts
you DON'T have in that list can be added by the hosts you DO have on the
list.  If you knew ahead of time what they were (and decided that
trusted them) you could just add them to the existing repo collection as
failover sites or whatever.  The only point in having finer grained
control is if there are intermediary levels of trust between always
trusted and never trusted.  I'd argue that there are (and do, below) but
that this doesn't require an auxiliary config file containing host trust
information but an additional trust variable to be added to the existing
repo descriptors.

> You speak of the "whim" of an intermediate repository administrator,
> but I think repository administrators are far more likely to have an 
> accurate model of repo dependencies than end users are.  

Sure, but that's not the point.  It might be more convenient -- it's a
great idea, even, if it can be made secure and can address other
security/trust issues such as gpg signature management, which I actually
think is a more significant issue for most newbie users. However,
proposed as a transparent forwarding mechanism where the choice of
forwarding sites will often never been seen by humans during the nightly
update, it really significantly increases your security exposure,
especially if you DO use yum to install and occassionally update from a
twig-level repository (one with just a few packages run by a private
individual or single organzation).  You also run into potentially
greater problems with mix-n-match dependency loops if the dependency
tree in the forwarded sites differ from the assumptions in your install
tree up to that point, but that's another story.

Please don't misunderstand me -- I do think that there is something that
needs to be done here.  I myself have been experimenting with the idea
of the "mini-repository".  I maintain some 3-4 rpm packages that are
used here and elsewhere to at least some extent, and it is not at all
easy to keep repositories that include them anything like up to date, so
I get a slow stream of bug reports from people running old versions.
For them to update to the latest version often requires a tgz or src rpm
download and rebuild, even though I >>do<< have binary rpm builds that
would likely work on their systems.

I've been trying to assemble them in a "mini-repository" that contains
rpm builds for some 4-6 distribution/CPU arch combos (what I have
available here to build on).  I've been FANTASIZING about making this a
private "yum repository" where the primary install/update mechanism
people would use would be to drop the repo data (provided) into their
yum setup, and just do e.g. "yum install jove" (one of the packages I've
been building since I've been using jove for too many years to quit
now:-).  What I've built "works" for this, but it is a clunky mess for a
user to set up at their end.  They have to download the yum repo image
corresponding to their distro etc, my gpg key, install the one in the
right place, install the other with the right tool, and only THEN can
they install jove, or dieharder, or whatever.

What is needed here is a way for the toplevel repository metadata to be
encapsulated and served/retrieved to automate this to where a non-expert
could manage it without needing to know about rpm-import or where or how
to drop a repo into /etc/yum.conf or /etc/yum.repos.d/.  I'd argue that
the kind of automation needed isn't transparency of yum itself, which
isn't intended to be transparent but rather functional and deliberate,
but an auxiliary yum management interface to encapsulate things like
this for the non-expert user.

> 
> >               I'd be perfectly happy for a repository to SUGGEST
> > possible secondary servers to use in yum's already existing repository
> > fallback mechanism -- I just don't want to follow such a chain blind,
> > especially offsite and into a different organization that I might not
> > trust.  
> 
> The mechanism required for the repository to make suggestions is the
> same as the mechanism for it required to do transparent redirects,
> The only difference would be policy choice in the client.
> 
> You're quite right that following redirects blindly would be bad.  The
> correct answer is to have the facility, but make sure that following
> the redirects isn't blind.
> 
> I think my proposal and your counter just merged.  For either proposal,
> we need a new piece of per-repository (or perhaps per-channel) metadata,
> which is a package redirect list.
> 
> The client (yum) needs to know that it should look at that list when a
> repo search fails to find a match.  It's a client policy issue how
> much checking the client does before chasing the link.

Fair enough.  Although I think there are also host and rpm
authentication issues one might want to wrap into this, given that one
is going to the effort in the first place.  Spoofing is always an issue
-- is yum secure if run across intermediary untrusted networks?  At the
moment, I'd say the only protection is gpgcheck.  Alas, the gpgcheck
interface is very clunky, especially for novice users running their own
systems, so there is a strong temptation to just turn it off the first
time it won't let you install something you want to install.  So I
personally think that enabling secure (ssl?) retrieval and management of
keys is as important as enabling secure retrieval of other repo
metadata.

> > It would be truly lovely if the tool directly supported the kind of
> > usage I just implicitly described -- having a search tool for
> > repositories from which I can install e.g. realplayer, a selection tool
> > to pick a repository from the list that I can trust or which has e.g.
> > SSL credentials I can verify and via which I can obtain trusted gpgcheck
> > keys, a tool for manipulating those keys (another nontrivial problem for
> > the novice/single system administrator that is likely to have them
> > setting "gpgcheck = 0" the first time they encounter a problem, which is
> > likely to be the first time they add a twig-level repository), a tool
> > for turning on access to that repository for the time required to do a
> > single install or update, a tool for turning off access (put preserving
> > the repository in the list) when not deliberately installing or updating
> > from that site.
> 
> Agreed.  However, all this stuff depends on having the right metadata
> in the infrastructure first.  That's problem one.  
> 
> >                                                      Sheer numbers
> > of these twig repos in yum significantly increases your security risk
> > and creates the possibility of some nasty chain reactions if any sort of
> > loop topology such as "yum rings" is ever actually established.
> 
> I already thought of this one.  Breaking such loops is trivial; you keep
> a list of visited repositories and simply don't add a redirect if it's
> already present in the list.
> 
> > Compromising any repository in the ring might suffice in short order to
> > compromise the entire linked set of rings -- maybe even all
> > repositories, anywhere.  That would be tough for linux, and yum, to live
> > down.
> 
> Yes, but we already have this problem.  That is, any repo compromise
> could already fuck up a lot of end users badly. Redirects wouldn't
> make it worse, unless the redirection chasing is hidden from the user
> (which I'm not proposing).

But which would be precisely what happens in nearly all (easily 90%+)
cases, especially those involving the very novice admins and users you
want to protect for the real underlying complexity.  I'm a lot less
concerned that (for example) Seth would get burned by such a mechanism
than (say) my current CPS independent study student, who just put his
very first linux installation on his system in the dorms, dual boot.
Next to Seth, a raving paranoiac in an institution is only a bit
threat-challenged and he reads the log files even while he is asleep to
ensure that no Evil is being done. My student is utterly clueless about
what or where "/var/log/messages" is, what an RPM "is", what gpg is,
what yum is, what a dependency is, etc.  The yum setup he has was
actually created by Seth et. al. for the general campus community, and
is very conservative BECAUSE it has to work transparently for newbies as
well as be useful to experts (who can generally take care of
themselves).

Transparent redirects in automated nightly updates WOULD make the
security problem worse, at least worse in a linear proportion to the
number of redirect sites outside the line of explicit trust and control
of the LAN admins (assuming equal probability of compromise of all of
those external sites and their attached failover/dependency branches).

> > Lovely idea in the abstract, terrifying in the concrete.
> 
> As I said, I think my proposal and your counter have merged.  We can
> de-terrify this; all we have to do is inform the user of sources and have a
> known_hosts equivalent that stops the user for a check before diving
> into an unknown repository.

There are still a few remaining questions.  First is whether this should
be "a part of yum" or a management layer outside of the basic yum
toolbox as it currently exists.  This in turn is related to whether or
not one ends up deciding that a truly automated forwarding mechanism
should be inserted into urlgrabber.  Another question is whether or not
it is time for differential levels of trust.  I think that it is, since
I use differential trust and yum by hand already, and suspect lots of
other people do as well.  Finally is the always relevant question of who
will do the work -- it is easy to come up with ideas for useful software
(especially useful software that you want other people to write:-), but
having the time and resources to develop it yourself isn't so easy.  I
myself am pretty peaked out, even with a student working on selected GPL
projects I'm interested in.

I personally would argue that this should be done almost entirely
outside of yum itself as a separate GUI/Gtk-based tool, run as root,
probably even launchable from the System Tools menu with the little root
auth popup so novice users can discover and learn the interface sans
documentation.  Features/specifications:

  * A metadata server on toplevel distro repositories.  Metadata itself
should be XML (primarily to get a clean encapsulation and extensibility
without breaking during development) and recursive (so that branch
repositories can put up their own upstream/downstream metadata).

  * When first run, the tool contacts the toplevel/distro repository
suitably flagged as such in yum.conf and recursively parses the metadata
tree, identifying loops, compressing or flagging redundancies (as even
without loops a repository might appear as a twig or branch on multiple
branches) and such and (to the extent possible) validating the
dependency network.  At a guess, it will have to build a fairly complex
data structure to manage this, and will probably want to save it in
/var/cache/yum somewhere and subsequently operate on diffs/changes only.

  * It presents the results to the user as a tree -- a visualization of
the relationships you've posited in replies to Seth in other replies (I
started this reply yesterday but was interrupted, so I'm referring to
stuff in the future of this note).  The tree should almost certainly
have a single primary root associated with the actual distro or a
faithful mirror thereof.

  * The user should then be able to go through the tree and take various
selected actions on the branches.  Prune (branch and all decendant
branches).  Include (for all yum activities including nightly updates --
for trusted repositories).  Select (for a current/immediate yum command,
likely an install or update of a particular package on that branch or
twig).  Save (current picture of the active tree to /etc yum files).
Manage Perms (set gpgcheck, automatically retrieve/install gpg keys from
https drop for branch and/or attached twigs, trust/untrust, set
roundrobin or failover or whatever).

  * And while one is going to all this trouble for the back end
configuration stuff, it seems reasonable to add a GUI encapsulation of
the front end commands as well, so one can open the interface, select
(but not include) the realplayer site on a suitable branch, run a yum
install, deselect it or just not save the selection permanently, and
exit.

This requires NO modification of yum's base code, AFAICT, and permits
sysadmins to be able to choose (e.g.) whether or not to even include the
tool.  I do think that it would be worthwhile to consider a single
modification to yum itself: a differential trust variable.  Right now
there is an implicit binary distinction of trust:  trusted (in yum.conf
or /etc/yum.repos/) or not trusted (not there:-).  When I go to e.g. the
dag repo to grab/install an xmms skins rpm or an xmms plugin to play
some specific kind of music file, I DON'T permanently trust that site
and don't want to have to worry about whether it contains some package
that is newer than something I'm running, but broken or inconsistent
that will overwrite my perfectly functional and stable setup during the
nightly update.  I want to be able to turn it on, run a single install
command (attended) and then turn it off, without having to create a repo
descriptor (by hand), put it in the proper place, run yum, REMOVE the
repo descriptor or comment it all out, and then return to normal
operation.

Of course I >>can<< do all of that, and do, but even the thought of
describing how to do it to a newbie gives me a headache.  Especially if
one wants to actually gpgcheck the dag repository files.  If one adds a
single differential trust variable to the repo descriptors that
basically corresponds to the include/select destinction above so that
the saved repo data has it set (by the GUI or by hand), one could add a
single command line option such as "-s" instructing yum to use both
repositories from the included (default) list and the selected
(optional/transient) list for the current command only.  This would
simplify yum management, whether by some future imaginary GUI or by hand.

This is consistent with yum's design philosophy, which is to ALWAYS do
the conservative thing by default, and to only do wild-n-crazy stuff
like descend recursed repository lists interactively and when explicitly
told to (and thence both at your own risk and with you THERE to watch
the results and see if you believe them).  If you DO convince Seth et.
al. that recursed lists are a good thing, they'd almost certainly be a
good thing only when run interactively and/or with the -s option.

Either way a lot of sites will want to NOT ever permit the use of this
sort of thing on user workstations.  If you are a single admin with
several hundred workstations to manage, you're going to want to be
totally fascist in your yum update and install policy and will
absolutely not want any sort of interactive per-system install of stuff
you haven't retrieved as a source rpm and rebuilt for a local
repository.  However, individual users will obviously find it to be very
useful, and even paranoid admins will find having the features useful
because, run on their development workstation, it does a lot of nasty
work for them that they now have to do by hand while putting together a
coherent set of rpm's for their local repo set.

As for who will build it -- almost certainly not me, although it does
look like it would be fun.  I'm just having too much fun already...;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@xxxxxxxxxxxx