Hello, I have two students interested in diploma thesis called Yum plugin for suggesting packages based on usage: http://bit.ly/18hrHbL TL;DR - from anonymized access log, create a database of suggested packages using data mining techniques and provide a Yum plugin that would suggest "Users of vim also installed: ctags, git, ..." I am gonna create a Fedora Feature wiki page shortly describing this in more detail. Our goal is to offer this project for integration into Fedora later on, at least provide Fedora packages for it. To do that, we need good source of data. It would be best to collect access logs from one or two main Fedora mirrors. We would provide short script in Python that would parse access logs and anonymize the data (IP address hash-salted) and filtered only relevant data (RPM files from latest Fedora release or updates repositories). That would be phase one which should give us a sample data. Phase two would be to integrate this script with logrotate and for one Fedora release cycle (Fedora 19) the script would collect relevant anonymized data into a file. Final suggested package database would be created from this file (or maybe files to allow us to move them on the fly out of the stat directory). The big (legal) question is if we are able to provide this anonymized data to public, or if we want to sign NDA with all people involved. I am CCing Tom for this question. I need your help with connecting to relevant people. Any comments are appreciated. Many thanks and I hope this effort will lead to improving user experience with Fedora packaging. -- Later, Lukas "lzap" Zapletal irc: lzap #theforeman _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure