Re: Software Management call for RFEs

Alek Paunov <alex@xxxxxxxxxxx> · Tue, 28 May 2013 23:04:05 +0300

On 28.05.2013 21:18, seth vidal wrote:
On Tue, 28 May 2013 20:42:13 +0300
Alek Paunov <alex@xxxxxxxxxxx> wrote:
So, it seems that yum already have the "filelists on demand"
optimization implemented. Why you are asking for removing a feature,
which do not make the things worse ... ?

I'm not.

But when you download the filelists - it is A LOT of data.

It is of course :-). It is big and slow now, but it implements one more 
distinguishing and convenient Fedora feature ... and under careful 
schema and encoding, can be scaled down several times in both space and 
query time.

Actually, every "positive" (install, update) yum operation implies 
access to the repos. Repos contain everything. If our software was 
perfectly optimized, not only filelists but all other parts of the 
database (including primary.files, which you have cited initially) 
should be lazily synced, right?

I'd rather not have filedeps so it doesn't get pulled in for other
things in depsolving.

Sorry, I do not know how this amount of data will impact libsolv in the 
future. IMO, for yum (I mean in the sqlite based solution) it is a 
matter of optimizations.

I have a few questions:

   * What is the reasoning behind the splitting of the database across
many .sqlite files?

many? it's 3 afaik. primary, filelists, other.

how do you mean 'many?

Multiplied by the number of the repos. That is what I am trying to 
understand - Why not just single .sqlite file for the whole yum database?

   * Why the sql schema is so denormalized (IMO, leads to both
bandwidth and disk overspending without speed benefits)?. For
example: Why provides and requires tables do not use the common
domain table?

B/c it was designed 8yrs ago and we were going for compressable space
and making it as quick as possible to search?

In the provides and requires example, we do not have any space/speed 
benefits achieved by the missing common domain (dependency + 
dependency_evr tables). In the current situation we have fat and slow 
text duplication and indexes instead of integer references to the domain 
subnodes (dependencies is the biggest domain in the primary). Yes, in 
bunch of cases a little denormalization is inevitable when we fight for 
speed, but IMO, this and few other space flaws are with negative impact 
on the speed too.

   * Why the incremental update mechanism (eg. applying xml diffs to
the sqlite database) was not been considered from the very beginning?

It wasn't necessary? There was a massively smaller number of pkgs to
consider.

Indeed. Also, 8 years ago the possibilities and the number of ideas to 
reuse were definitely different :-)

Thank you,
Alek

--
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel