Re: Anonymized access log from a fedora mirror

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 3 May 2013 14:35:45 +0200
Lukas Zapletal <lzap@xxxxxxxxxx> wrote:

> Hello,
> 
> I have two students interested in diploma thesis called Yum plugin for
> suggesting packages based on usage:
> 
> http://bit.ly/18hrHbL
> 
> TL;DR - from anonymized access log, create a database of suggested
> packages using data mining techniques and provide a Yum plugin that
> would suggest "Users of vim also installed: ctags, git, ..."

So can you explain how this would work? 

How do we know that any particular person who installed yum installed
anything else? Are you using IP address to try and see what each IP
user installed? I can think of... a lot of ways that won't work. ;)

Another approach might be to work on https://fedorahosted.org/census/
This is the replacement for smolt, but never seems to have gotten very
far. It would be an application end users install. 

> I am gonna create a Fedora Feature wiki page shortly describing this
> in more detail. Our goal is to offer this project for integration into
> Fedora later on, at least provide Fedora packages for it.
> 
> To do that, we need good source of data. It would be best to collect
> access logs from one or two main Fedora mirrors. We would provide
> short script in Python that would parse access logs and anonymize the
> data (IP address hash-salted) and filtered only relevant data (RPM
> files from latest Fedora release or updates repositories). That would
> be phase one which should give us a sample data.

We had a discussion about making our logs public a while back, and I
think that discussion ended with us saying the IP addresses wouldn't be
safe to publish, even hashed.
 
http://lists.fedoraproject.org/pipermail/infrastructure/2012-April/011658.html

> Phase two would be to integrate this script with logrotate and for one
> Fedora release cycle (Fedora 19) the script would collect relevant
> anonymized data into a file. Final suggested package database would be
> created from this file (or maybe files to allow us to move them on the
> fly out of the stat directory).
> 
> The big (legal) question is if we are able to provide this anonymized
> data to public, or if we want to sign NDA with all people involved. I
> am CCing Tom for this question.

it's been asked before. 

I want to be cautious about this. ;) 

> I need your help with connecting to relevant people. Any comments are
> appreciated.
> 
> Many thanks and I hope this effort will lead to improving user
> experience with Fedora packaging.

kevin

Attachment: signature.asc
Description: PGP signature

_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure

[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux