Good morning everyone, As you know Ralph and I went to DevConf last week-end, and of course, what happens when you put two hackers in the same room? Well they go crazy and start hacking... The result of this is summershum. The idea originates from a discussion between Mickael Scherrer, Ralph and I on Friday evening. Could we track all the files in every packages in the distribution? Ideally, this would allow us to investigate questions like: - How many copies of the GPL license are shipped? - How many GPL license still ship the old FSF address? - How many copies of jquery or md5.c? - How many files changed between two releases? So Ralph and I wrote summershum, it's a simple database storing for each file in each package: - the packages name - the filename - the sha1sum of the file - the tarball name - the md5sum of the tarball - a creation date Next to the database is a fedmsg consumer that for each new upload on the lookaside cache, download the new tarball, extracts it and fills the database with the sha1sum of every file found. There is a RFE opened on the project to store the same information for the binary/rpm themselves. This would work for each successful build on koji. The project is currently at: https://github.com/ralphbean/summershum It comes with a summershum-cli which fills the database using datagrepper to retrieve the recent uploads to the lookaside cache and load them in the database. I think the current state is good enough to start deploying it but we wanted to announce/discuss about it before taking any further action. So, what do you think? Cheers, Pierre
Attachment:
pgpr7MYTsl9My.pgp
Description: PGP signature
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure