Re: Some ideas/questions about yum

Florian Festi <ffesti@xxxxxxxxxx> · Fri, 28 Aug 2009 14:43:10 +0200

On 08/28/2009 12:27 PM, Hedayat Vatnakhah wrote:
Now, some ideas:
3. AFAIK, currently yum's primary database file contains information 
about packages, and all of the files in directories such as /usr/bin 
and /usr/lib, so that it can resolve package and file level 
dependencies. Isn't it possible to move file level information outside 
primary db (e.g. to primary_file_deps.db) and translate internal 
dependencies from file level dependencies to package level 
dependencies when creating repositories? (So that provides and 
requires tables in primary db only contain package references rather 
than file references?).It might be even possible to do it for 
dependencies outside repository; for example when creating updates 
repository, you can introduce fedora repository to createrepo, so it 
can translate all of the file level dependencies of updates packages 
also.

Bad idea as you never know all repositories existing. Bad idea because 
you don't want to recreate all repos when one of them changes. Bad idea 
because changing the data of the packages in the repo likely to lead to 
other problems.

There have actually been efforts long ago to improve the set of files 
shipped with the primarydb to lower the need of downloading the filelist 
while still decrease the number of file shipped in the primarydb. AFAIK 
they got rejected by yum upstream at that time because the also needed 
cross repo closures (Although this was much less problematic as what you 
have suggested here).

4. Even if the above solution is possible and can reduce the size of 
primary db, it won't solve the main problem: for large repositories, 
you'll need to download large database files. You'll need to download 
extra database files on some use cases anyway. So, it can be said that 
currently yum doesn't scale well.
True.
What do you think about it: we can implement parts of yum at the 
server side (e.g. a web service), and do queries online. The client 
can submit queries to online repositories, aggregate the results 
(+using local repositories by itself) and do appropriate actions. It 
can also store received data to be used when offline or while they are 
valid. It'll be completely backward compatible with the current 
clients: those who use the old method can download repositories 
themselves, like what they do now.
It is possible to think about further details and design it 
completely, but I want to know about your opinions about the whole idea.
Web services have the problem that they don't mix well with our mirrors 
infrastructure of simple and stupid http/ftp/rsync servers largely 
provided by volunteers. It is also difficult to GPG sign external web 
services. Because of this the whole traffic of such web services would 
most likely need to run over Fedora infrastructure.

When we think of a Fedora that has grown another order of magnitude (may 
be 2015) it will become hard to argue against a more centralized 
solution. Right now we are not at the point where the pain of the local 
repo db does out weight the complexity of a web service architecture IMHO.

But there is another way to drastically reduce the amount of data that 
has to be transferred: Delta meta data

The repo data bases could be split up into deltas in a similar way as 
done with the delta rpms aka presto. As a result the meta data of each 
package would be downloaded (more or less) exactly once. While this idea 
is arround for a while an implementation is still missing...

Florian

--
________________________________________________________________________
Reg. Adresse: Red Hat GmbH, Hauptstätter Str. 58, 70178 Stuttgart
Handelsregister: Amtsgericht Muenchen HRB 153243
Geschaeftsfuehrer: Brendan Lane, Charlie Peters, Michael Cunningham,
Charles Cachera

--
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-devel-list