On Sun, 27 Dec 2020 at 17:52, Matthew Miller <mattdm@xxxxxxxxxxxxxxxxx> wrote:
On Sun, Dec 27, 2020 at 07:44:57PM +0100, clime wrote:
> I think we can simply parse server-side access logs to count package
> downloads, no?
We can for our primary server, but most people get updates from mirrors
which we don't run directly. The central mirrorlist (from which I get the
dnf count data) just redirects people to those mirrors. Even if we could get
package download counts from the mirrors, they're heavily skewed by:
* public mirrors pulling the whole thing
* people pulling the whole thing for a private mirror
* ci and build systems (like, running mock)
* mysterious bots downloading stuff for whatever reason
* proxies and caching
There are a couple of other items which make it hard to see and impossible for even our primary servers to be useful. When you look at the logs, there is nothing that indicates whether a package is being installed, updated, or pulled in as a dependency. This means that any stats will show which packages get updated the most during a release or have a lot of sub-packages which might get pulled in.
The mirroring effect also has a noise problem where a client got some of his packages from one mirror and then got mostly dependencies from a secondary mirror.
Finally CI and build systems swamp all other downloads from mirrors these days. Depending on how they are setup some seem to do a ```yum install *``` before operating. My guess is that at least 60% of all traffic is CI these days. (I expect that this also the case for a lot of other distributions also).
Packages with lots of updates sounds like they might be getting more interest but you have a lot of upstreams who do 2 week sprint releases which mean there are lots of regular updates.
All in all, what you get by looking at a mirrors data is a 'reverse popularity contest'. Packages like the kernel, glibc, firefox, and every dependency which gets an update sits on top. Packages at the bottom may be the ones being asked for but they are also dependencies which aren't pulled in a lot or don't see an update.
In the end I think popcorn might be better BUT they are also hard to setup in these days of trolls and GDPR. [Heck smolt had almost more trolls in it than regular data by the end of it.. so many people set up PDP-11 and VAX as their hardware running Fedora.]
and probably more. Popcon and smolt are better because it's actual
individual system data. On the other than, they're worse as mentioned
because opt-in doesn't give a realistic picture.
--
Matthew Miller
<mattdm@xxxxxxxxxxxxxxxxx>
Fedora Project Leader
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Stephen J Smoogen.
_______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx