On Mon, Mar 01, 2010 at 10:42:32AM -0600, Mike McGrath wrote: > It looks like popcon has like 93000 profiles? Smolt has 1.8 million [1] > and even at that level without package data we have horrible performance > issues. If I were to add packages with my knowledge of db's, smolt would > become useless within a month because the thing would be completely > unavailable. > > If someone *really* wants to do this and knows more about databases then I > do, I'll help them through it. It's a high bar though and not to be taken > lightly. Imho for the beginning, there is no need to be able to query complete profiles, but it would be enough to have a count per package. A simple implementation for this would be: 1) clients send a plaintext list of installed packages and a UUID every X days or by user request 2) file is stored in UUID.$timestamp (or it is stored as a BLOB in the DB) 3) once a day a crawler reads all files and counts for each package how often they are installed, this is stored in a DB for easy querying 4) all files older than X days are deleted rpm -qa xz compressed uses 17K on my system, for 1.8 million profiles this would require 31GB of storage, but this amount of storage would be needed at least of every approach if we need this details. The only improvement I can think of would be to only report the leaves and compute the dependencies on the server, then we can use the output of "package-cleanup --leaves --all", which is xz-compressed only 4K on my system or 7.4 GB for 1.8 million profiles. Regards Till
Attachment:
pgpbtqNT0arg0.pgp
Description: PGP signature
-- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel