Re: Some ideas/questions about yum

Hedayat Vatnakhah <hedayat@xxxxxxxx> · Fri, 28 Aug 2009 21:35:33 +0430

Hi again!

Florian
Festi
<ffesti@xxxxxxxxxx>
wrote on ‫جمعه ۲۸ اوت ۰۹،
۱۷:۱۳:۱۰‬:
On 08/28/2009 12:27
PM, Hedayat Vatnakhah wrote: 

  Now, some ideas: 

3. AFAIK, currently yum's primary database file contains information
about packages, and all of the files in directories such as /usr/bin
and /usr/lib, so that it can resolve package and file level
dependencies. Isn't it possible to move file level information outside
primary db (e.g. to primary_file_deps.db) and translate internal
dependencies from file level dependencies to package level dependencies
when creating repositories? (So that provides and requires tables in
primary db only contain package references rather than file
references?).It might be even possible to do it for dependencies
outside repository; for example when creating updates repository, you
can introduce fedora repository to createrepo, so it can translate all
of the file level dependencies of updates packages also. 

Bad idea as you never know all repositories existing. Bad idea because
you don't want to recreate all repos when one of them changes. Bad idea
because changing the data of the packages in the repo likely to lead to
other problems. 

You don't need to know about all existing repositories, since you can
still resolve file level dependencies. In such cases you'll be forced
to download the other file I mentioned (primary_file_deps.db). 

I don't see why you'll need to recreate all repos when one of them
changes! Sorry :( And its impossible to state anything about the last
item.

There have actually been efforts long ago to improve the set of files
shipped with the primarydb to lower the need of downloading the
filelist while still decrease the number of file shipped in the
primarydb. AFAIK they got rejected by yum upstream at that time because
the also needed cross repo closures (Although this was much less
problematic as what you have suggested here). 

  4. Even if the above solution is possible and
can reduce the size of primary db, it won't solve the main problem: for
large repositories, you'll need to download large database files.
You'll need to download extra database files on some use cases anyway.
So, it can be said that currently yum doesn't scale well. 

True. 

  What do you think about it: we can implement
parts of yum at the server side (e.g. a web service), and do queries
online. The client can submit queries to online repositories, aggregate
the results (+using local repositories by itself) and do appropriate
actions. It can also store received data to be used when offline or
while they are valid. It'll be completely backward compatible with the
current clients: those who use the old method can download repositories
themselves, like what they do now. 

It is possible to think about further details and design it completely,
but I want to know about your opinions about the whole idea. 

Web services have the problem that they don't mix well with our mirrors
infrastructure of simple and stupid http/ftp/rsync servers largely
provided by volunteers. It is also difficult to GPG sign external web
services. Because of this the whole traffic of such web services would
most likely need to run over Fedora infrastructure. 

IMHO, even a single php/python script can provide such a XML RPC
service (web service was just an example). Mirrors could get this file
just like the other files when syncing. But well, it'll be http only.
The GPG sign issue could be problematic, but would you really need to
sign the traffic?! 

When we think of a Fedora that has grown another order of magnitude
(may be 2015) it will become hard to argue against a more centralized
solution. Right now we are not at the point where the pain of the local
repo db does out weight the complexity of a web service architecture
IMHO. 

No, I don't want to invite to a centralized solution. 

But there is
another way to drastically reduce the amount of data that has to be
transferred: Delta meta data 

The repo data bases could be split up into deltas in a similar way as
done with the delta rpms aka presto. As a result the meta data of each
package would be downloaded (more or less) exactly once. While this
idea is arround for a while an implementation is still missing... 

Yes, I know. In fact, at first I decided to start working on that. But
you'll still need to download it once (while it's possible to put the
Fedora repository's metadata into Fedora DVD!). I though that if the
proposed solution works, it would be better than delta metadata files. 

Even if the whole client/server idea is considered bad, there might be
some other ways of organizing the repository metadata so that yum would
still download the data it needs rather than all data. But currently
I'm interested to here about this Idea...

Thanks,

Hedayat

Florian 

-- 
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-devel-list