On 1 December 2014 at 05:05, Reindl Harald <h.reindl@xxxxxxxxxxxxx> wrote:
Am 01.12.2014 um 12:57 schrieb Pierre-Yves Chibon:no number is in fact better than wrong numbers backed by nothing beause they lead in wrong conclusions - your 122/133 numbers could in reality also be 1000 users installed them from mirrors and your calculation is the best example for wrong assumptions
On Mon, Dec 01, 2014 at 12:38:24PM +0100, Reindl Harald wrote:
Am 01.12.2014 um 12:36 schrieb Alec Leamas:
On 01/12/14 12:29, Reindl Harald wrote:
Am 01.12.2014 um 12:26 schrieb Alec Leamas:
Lets face it: I envy those who can measure the usage from a download
counter or so. Can we have something similar?
no - you have no clue which mirror was used without explicit tracking in
YUM/DNF and given the noise about the recent Firefox changes you won't
even consider seriously tracking inside the distribution
additionally downloads are meaningless - many setups with more than one
machine have their local mirrors and a download can be 1, 10 or 50
installed instances
I hesitated when writing my initial message, didn't include this:
Feedback why this is impossible isn't really helpful here, most of us
are aware of the limitations. Given that we agree on the overall goals
(?), useful input is what can be done, and how
it is helpful because the fact it is impossible will shutdown that
discussion because - well, it's impossible
The question becomes, is any numbers better than no number?
In theory, we could get an idea of how much a package is downloaded. Mirror are
syncing all the content, so they introduce a baseline while user is what
introduce the variability.
So if we were to be able to gather logs from a) the main repos + b) some
volunteer repos, we could get a trend.
The number would of course not be exact as you mentioned but we could get an
idea, something like: we have 132 mirrors and my package was downloaded 133
times, which potentially means there is one user (me) using that package.
There might be more, but if no-one ever reports a bug and we see the number of
download is basically equal to the number of mirrors, we can get an impression
that this package isn't used by many people.
So we come back to the question: is any number better than no number at all?
Even to get a trend?
While that is 'true', most of the world doesn't work on 'true'. Your cars speedometer doesn't give you the accurate km/hour. [Even BMW digital has a +/- 2 km/hour due to all the factors from tire present size to road conditions.]. The answer is can you accurately remove enough noise to feel confident that you are doing 100 km/hour versus 120 km/hour. The same goes for measuring downloads.
you don't have 132 downloads because 132 mirrors
in fact you have *zero* - mirrors are done with rsync
Not all mirrors rsync. Quite a few mirror via http using scripts that were originally ftp ones. Which is why I don't use the numbers from dl.fedoraproject.org for matching usage of packages. So many sites mirror locally using http that the noise on dl.fedoraproject.org is impossible to pull out. What instead one can do is take multiple sub-mirrors (eg things yum/dnf talks to only) and then statistical analyze the data from their logs to see if you can figure out the noise. If that is possible then some sort of number of users and most used packages should be able to collect.
I won't say its impossible because I have seen quite a bit of statistical magic pull out reality from all sorts of 'noise'. I will say that it is not very easy and that it is probably more effort than most people are willing to put in.
Stephen J Smoogen.
-- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct