Re: Statistics. Stats for installled or downloaded packages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Mon, 5 Nov 2018 at 09:56, Anatoli Babenia <anatoli(a)rainforce.org&gt; wrote:
> 
> I think the page should be archived/removed. Mainly because a lot of
> the questions people want answers for usually also get in the way of
> people wanting privacy.

I agree that the page in its current state is not useful, but why do you
propose to censor information about how Fedora handles privacy instead
of explaining it on a case by case basis?

Without statistics people are pretty much limited in synchronizing the
view of the world to make a joint action. For example, with qdigidoc stats
we could try to get some funding for Fedora development from EU. And
also it could be an opt-in feature like https://popcon.debian.org/

I am not saying that the stats are reflecting anything, but with some
adjustments they still can be useful.

> Currently there is no way to know what
> packages are being installed/downloaded the most. yum and dnf
> downloads not provide those answers on purpose (it would require more
> computational power on the servers than we have and it can't be easily
> made anonymous. The data we can get is only basic information like
> 'what version of yum/dnf used', 'what arch was asked for', 'what was
> the version of Fedora/EPEL wanted' and 'what was the public ip
> address'. This loses all kinds of additional information and masks
> things like proxies, mock builds, etc which inflate/deflate numbers in
> different ways.

Just a hypothesis. If HTTPS/SSH and dnf protocol uses fixed size packets
and encryption increases the size proportionally, then I can guess the
combination of packages being installed based on time of request and
request size, so it doesn't help to hide that.

Recording IP is a big deal on its own. But for stats it can be replaced with
just increasing counter. And you also forgot to mention about virtual
machines and containers that also inflate the numbers. I don't believe
that right now anybody has the incentive to keep the numbers on usage
for `qdigidoc` higher than a real usage, and even if that's the case, the
guys from the other side can validate the data according to the number
of sessions with unique ID cards from Fedora to their servers. That's the
whole point of it - making the first step to go further and pass the ball to
the other side.

Also from file serving mirrors I'd expect the bottleneck to be in a
bandwidth and not processing power. Storing IPs for each can be
inefficient, but can we get some statistics about that? I could not find any
example mirror at https://nagios.fedoraproject.org/nagios/

> That page is even older than the one you pointed to and should also be
> archived/removed. We are probably on Statistics 5.0
> 
> 
> I am sorry but there is no way to answer that question.

I want to believe, but because you touched my paranoia from the start, is
there a dump of client server session with logs to do a proper privacy
audit? Now I need to feed the lawyer inside. :D
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx




[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux