On Sat, 6 Mar 2010, Till Maas wrote: > On Mon, Mar 01, 2010 at 10:42:32AM -0600, Mike McGrath wrote: > > > It looks like popcon has like 93000 profiles? Smolt has 1.8 million [1] > > and even at that level without package data we have horrible performance > > issues. If I were to add packages with my knowledge of db's, smolt would > > become useless within a month because the thing would be completely > > unavailable. > > > > If someone *really* wants to do this and knows more about databases then I > > do, I'll help them through it. It's a high bar though and not to be taken > > lightly. > > Imho for the beginning, there is no need to be able to query complete > profiles, but it would be enough to have a count per package. A simple > implementation for this would be: > > 1) clients send a plaintext list of installed packages and a UUID every > X days or by user request > 2) file is stored in UUID.$timestamp (or it is stored as a BLOB in the > DB) > 3) once a day a crawler reads all files and counts for each package how > often they are installed, this is stored in a DB for easy querying > 4) all files older than X days are deleted > > rpm -qa xz compressed uses 17K on my system, for 1.8 million profiles > this would require 31GB of storage, but this amount of storage would be > needed at least of every approach if we need this details. > > The only improvement I can think of would be to only report the leaves > and compute the dependencies on the server, then we can use the output > of "package-cleanup --leaves --all", which is xz-compressed only 4K on > my system or 7.4 GB for 1.8 million profiles. > I'm happy to provide a current dump of the database if you want to populate it with sample data and see how things look? -Mike -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel