[Yum] More on BitTorrent and YUM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bill Cox writes:

> As for why local mirrors aren't popular... I suspect it's because there
> are a LOT of sites out there that people download from.  You can't
> mirror them all locally, unless you're a huge ISP.  However, we could
> mirror them all with BT based servers.  I could probably serve RPMs to
> the entire FC4 community from my laptop over wireless.

Actually, I think that 99% of all people install from a (mirror) site
and end up running yum pretty much off of that site (or a mirror of that
(mirror) site) forever.  After all, that gets the person FC X, complete,
often plus some extensions, and FC is fairly functionally complete as
is, especially if dressed up with a few extensions supported by a local
sysadmin.

A relatively FEW people (such as yourself) who obviously have mad skills
and a deep thirst for 7337 applications, peruse not only their original
install base and its update and local additions, but actively seek out
sites that build significant, often cool, extensions, or build tools or
packages left out of the base install for political or legal reasons.  I
do some of this myself, as (for example) I still have a ton of mp3's
that I cannot rerip and hence need xmms-mp3 whether or not Red Hat
is comfortable distributing it (until somebody figures out how to
convert mp3->ogg without modulus noise and signal
degradation/distortion).

However, for those particular packages (and generally there are only a
few of them) there are already multiple strategies for the 7337 that
don't require bittorrent.  e.g. turning access to their particular
hosting repositories only when you need them (to install or search for
packages).  Locally mirroring only the particular part of those
repositories required to satisfy the requisite dependency tree of only
those packages you've installed.  Setting up yum to ignore everything on
those particular repositories BUT the packages you are interested in and
their minimal support tree.  

Doing a lot of these things "by hand" (custom crafting a solution for
just the few packages associated with a repository) is a good idea (if
not strictly necessary) anyway and is very definitely a for-experts-only
sort of thing to do.  My own experience at least is that some of the
extended/advanced super-repositories have enough hacks and patches and
updated revision numbers that doing a SINGLE UPDATE with them in the
standard update path will often end up with those sites "owning" your
system as they de facto update 1/3 of your base packages, including many
that you have no earthly interest in seeing updated outside of the
standard, reasonably trustworthy, FC tree.  To recover from just this
sort of thing I've had to back off by doing a full reinstall just to
take back control of a system where this happened, and now I am very
careful not to do a "yum update" with repositories outside of my
standard update set active.

In other words, playing the multi-repository game is a job for experts
in an expert-friendly field, not something that most users can manage.

In this sense, bittorrent (enabled/implemented as you describe) will
make things >>worse<< by encouraging people who LACK the skills to try
to do exactly this sort of thing without any thought for or knowledge of
the consequences.  Updating from a list of 30 repositories isn't really
all that wise or desireable -- it's sort of "shades of rpmfind" days,
where nearly anybody could fubar their system by grabbing rpm's built on
top of nearly any base and hey, maybe install them (maybe with --force)
and maybe they'd work.  Yum is brilliant and designed to TRY to keep you
out of trouble, but it cannot work miracles and EACH complete repository
you might include is usually built/layered for INTERNAL consistency --
sometimes (even often) at the expense of global "FC X" consistency.
Every local "fix" or patch or addition that alters an rpm shared by a
large chunk of the base increases divergence from the base and the base
update stream.  This is the basic problem with all package distribution
schemes and the reason testing and validation is actually very
important, at least to people who love stability and hate pain.

Finally, there are the security issues to think of.  When you say
"fairly secure" below it makes me very nervous.  Distributed
distribution protocols (to me) imply a tremendous amount of distributed
trust and "cannot" in general be made truly secure.  That is, they
probably can but the cost of true security is the implementation of
central network authorities that sell you security for out of pocket
money, as e.g. the toplevel SSL CA's do now for even the modest degree
of security provided by at least knowing that the host you are
contacting really is the host that you think it is and not some hacker
kid down the hall with a laptop with a fast network interface.

The internet punishes trust and rewards certainty.  As things are now,
you must trust the administrators of whatever site(s) you install from.
The fewer administrators that there are in the mirror chain back to the
primary distribution sites, the less of a chance that the rpm's you
install will be trojanned. gpg signatures, md checksums, and the like
are all simply lovely ways of verifying file validity, but they rely on
having the correct original signature keys, a correct list of the
checksums, and most of all, on users who don't turn them off the first
time they try and install and don't have the keys installed and don't
know how to get and install them.

MY vote for yum's next serious extension is an SSL-verified key
retrieval and installation tool, since without SSL (or some other
toplevel certificate authority) in the chain one cannot even (really)
"trust" the keys one downloads and installs assuming one DOES know how
to find, download, and install the right keys for some distribution or
repository.  Figuring out spoofs, redirects, man in the middle attacks,
and so on to circumvent just pointing a browser at what you think is the
right URL is left as an exercise for the studio audience, as is the
rather scary estimate of the number of sites/users that currently just
turn off gpg checking the first time they want a signed rpm from a site
that has new/different keys that they don't know how to install.

Bittorrent will just make all of this worse, in every way, I think, if
only by ENCOURAGING users to start "shopping" dozens of repositories
instead of just one or two that are probably sufficiently "local" that
the network in between is approximately trustworthy.  I could be wrong,
of course, but I think.

> As for technical issues in BT, they can all be addressed, but a new
> protocol will have to be implemented.  It can be similar to BT (even a
> strict super-set).  However, it will be a new protocol.  For me, that's
> the fun part.

And now to the constructive part of my comment.  There are always many
ways to solve any problem.  One way to solve THIS problem (implementing
BT in a "urlgrabber"-like link in the yum installation/retrieval chain)
is, as suggested, to build a complex extension of BT that can be
directly integrated into yum and, with a fair bit of tweaking and
patching and addressing security concerns, eventually turn the entire
FC-X installed base into some sort of massive rpmfind entity.  (Brrrr.
Icy fingers just played up and down my spine:-)

OR, you can just write a BT client to automagically build and update a
local mirror using data in /etc/yum.repos.d/ and yum's idea of what is
currently installed.  Whoa, that sounds like it would actually be pretty
EASY.  You could probably even use yum, or components of yum, to help --
you'd still likely want/need some sort of extended dependency resolution
and the ability to select just PARTS of e.g. dag so as to not de facto
overwrite the entire master FC X base+updates with FC Xdag base+updates
(often different and more advanced in release number but tested outside
of the master FC x test/validation process in ways you may or may not
choose to trust, not picking on dag who after all I use to grab certain
things myself:-).  You'd also want to arrange it so that "yum list" or
"yum info" referred back to the original sites (to get data on stuff
that isn't locally mirrored), but a "yum install" precipitates
automagical mirroring of the installed package and its dependencies.

Easy and useful.  As a number of people have suggested, local updates
completely solve the bandwidth problem and the problem of making yum
"immediately" usable with access to complete repositories.  With 160 GB
disks available for $70 (with rebate) at Circuit City, it is clear that
nearly anybody 7337 can "afford" to keep a local update mirror on
anything bigger than an old laptop (and everybody else will continue to
just use their original base anyway).  If you set up a BT client that
does something like:

  a) Grab and mirror the repo(s) listed as primary mirrorlist entries,
and install it (them) on the LOCAL path(s) of the primary repo(s) on a
given system.  Or something like this, the point being to enable
something like:

[base]
name=Fedora Core $releasever - $basearch - Base
baseurl=/yum_repos/dulug/fc$releasever/$basearch
mirrorlist=http://fedora.redhat.com/download/mirrors/fedora-core-$releasever
enabled=1
gpgcheck=1

as pure automagic, creating the baseurl if it doesn't exist and putting
a copy of fedora.redhat.... into it.

  b) Search down the dependency trees of selected extension repos/sites
and mirror them TO THE EXTENT that you have entities installed from
those sites (only).  Here you might even need to extend the
functionality of yum, or use some of its more rarely used features, to
e.g. prevent your system from mirroring ALL of dag just because you have
xmms-mp3 installed, or updating from dag (and replacing 1/3 of your
operational system or more from it).  Here the building of a robust
solution will be tricky, but not because of BT.

Of course, for this type of thing rsync still seems like it would be an
easier tool to use as a base and is what I and most others use in their
custom scripts that accomplish pretty much this same thing.  It leaves
you with complete control over the resources devoted to maintaining
mirrors or updating.  Again to my own direct experience, bittorrent-like
solutions can really suck every bit of bandwidth out of a DSL link as
your system is being used to provide chunks of this and that to complete
strangers in exchange for the dubious benefit of being able (far more
rarely) to get chunks of this and that from them.

Anyway, you get the idea.  The basic principle here is that BT or rsync
or ftp -- grabbing truly remote rpm's in an update in real time sucks,
especially over a DSL link.  It's ok for an occasional install of a
small package -- otherwise it just burns lifetime.  It is therefore
desireable to trade some of your cheap, readily accessible disk for time
by pre-downloading and mirroring the RELEVANT part of the repositories
you have in your "permanent" repo list -- all of the base, the relevant
minimal chains for package specific additions.  Finally, you need some
really smart logic to at least try to help naive users from screwing
themselves by layering enough repositories on top of each other that --
"FC X" or not -- they eventually become de facto incompatible and break
the shit out of everything.

> BTW, I don't have a good name for the utility, assuming I do work on it.
> Is BTFS (for BitTorrent File System) any good?  I was also hoping to
> build a fuse interface to the utility that would allow you to mount the
> served directory structure as a local disk.  Is BT-FTP better?

"BTFS" sounds like it is a project in and of itself, if you really plan
to create an actual filesystem (that one mounts and everything).  Again
the basic concept itself sounds like a sysadmin's worst nightmare from a
security and resource control point of view -- lots of strangers putting
lots of files over which you have no control on a locally mounted
"virtual" filesystem, lots of strangers using your resources and
bandwidth to provide and retrieve those files to you and from you.  This
is serious business, and before you start you should investigate
carefully the expected scaling of the solution, other applications that
might use it, how in the world you are going to secure it and keep me
(as root on one of the participating sites) from slipping trojans into
the distribution chain for any of those applications.

This isn't like distributing data-only files -- music or video.  There
the risk is that a binary that might be used to read them has a buffer
overwrite attack that can be exploited by carefully crafted data, which
HAPPENS often enough but isn't horribly LIKELY for all of that.  You're
distributing the core libraries and binaries from which a system is
built.  Simply inserting a SINGLE RPM into the chain that is
"guaranteed" to be an update on all downstream hosts would compromise
basically everybody in the universe that had gpgcheck=0.  Presuming that
your tool is a SUCCESS and that pretty much all the independent FC X
build sites start to participate (at least tens of them), each with
their own personal gpg signatures or no signature at all, the
inclination for sites that use your tool to skip the signature check as
default behavior will be overwhelming.

Unless, as noted above, somebody FIRST fixes yum so that it can securely
retrieve keys, which may require some sort of "registration" of trusted
sites and their SSL identifiers or any of the usual nightmarish network
of extending trust across a fundamentally insecure network.  This is
essential (IMHO) to make yum "safe" for individual non-sysadmin-type
users who don't really know what a gpg key IS let alone how to retrieve
it (securely!) and install it.  In the meantime, using multiple
repositories and shopping far and wide for exotic packagings of this and
that is at your own risk, YMMV, don't blame yum if you break your system
and have to reinstall or live with bizarre bugs.

   rgb

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.dulug.duke.edu/pipermail/yum/attachments/20050630/2db99bd8/attachment-0001.bin

[Index of Archives]     [Fedora Users]     [Fedora Legacy List]     [Fedora Maintainers]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]

  Powered by Linux