Hi, ----- Mensaje original ----- > De: "Lukas Zapletal" <lzap@xxxxxxxxxx> > Para: infrastructure@xxxxxxxxxxxxxxxxxxxxxxx > Enviados: Viernes, 3 de Mayo 2013 14:35:45 > Asunto: Anonymized access log from a fedora mirror > > Hello, > > I have two students interested in diploma thesis called Yum plugin > for > suggesting packages based on usage: > > http://bit.ly/18hrHbL > > TL;DR - from anonymized access log, create a database of suggested > packages using data mining techniques and provide a Yum plugin that > would suggest "Users of vim also installed: ctags, git, ..." > > I am gonna create a Fedora Feature wiki page shortly describing this > in > more detail. Our goal is to offer this project for integration into > Fedora later on, at least provide Fedora packages for it. > > To do that, we need good source of data. It would be best to collect > access logs from one or two main Fedora mirrors. We would provide > short > script in Python that would parse access logs and anonymize the data > (IP > address hash-salted) and filtered only relevant data (RPM files from > latest Fedora release or updates repositories). That would be phase > one > which should give us a sample data. > > Phase two would be to integrate this script with logrotate and for > one > Fedora release cycle (Fedora 19) the script would collect relevant > anonymized data into a file. Final suggested package database would > be > created from this file (or maybe files to allow us to move them on > the > fly out of the stat directory). > > The big (legal) question is if we are able to provide this anonymized > data to public, or if we want to sign NDA with all people involved. I > am > CCing Tom for this question. > > I need your help with connecting to relevant people. Any comments are > appreciated. > > Many thanks and I hope this effort will lead to improving user > experience with Fedora packaging. Not sure from our side, but Debian has always a package "popularity-contest", which automatically submitted packages to make those list of recommended packages. Maybe it will require some information from legal team, but from the initiative part, it sounds good :) Regards, Pablo -- Pablo Iranzo Gómez (Pablo.Iranzo@xxxxxxxxxx) Senior Global Profesional Services Consultant (RHCA, RHCSS, RHCDS, RHCVA, RHCE, RHCSA, RHCSP, JBCAA) #110-215-852 Phone: +34 645 01 01 49 (CET/CEST) GnuPG KeyID: 0x5BD8E1E4 _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure