F40 Change: Privacy-preserving Telemetry for Fedora Workstation (System-Wide)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Important process note: we are experimenting with using Fedora
Discussion as part of the Changes process. Change announcements (like
the one you are reading right now) will still be sent to the
devel-announce mailing list, but the conversation about each change
will take place on Fedora Discussion at
https://discussion.fedoraproject.org/t/f40-change-request-privacy-preserving-telemetry-for-fedora-workstation-system-wide/85320


This will follow the same process as before, just with discussion in a
different format
https://docs.fedoraproject.org/en-US/program_management/changes_policy/


You can subscribe to and interact with these conversations by email.
See https://discussion.fedoraproject.org/t/guide-to-interacting-with-this-site-by-email/
for detailed instructions. To make sure you do not miss anything, make
sure that you have the Change Proposal category set to “Watching” —
or, if you just want to get notified about new changes but not every
reply in the conversation, to “Watching First Post”. (Click on the
little bell icon at the top right of the category page.)




The below document represents a proposed Change. As part of the
Changes process, proposals are publicly announced in order to receive
community feedback. This proposal will only be implemented if approved
by the Fedora Engineering Steering Committee.


== Summary ==

The Red Hat Display Systems Team (which develops the desktop) proposes
to enable limited data collection of anonymous Fedora Workstation
usage metrics.

Fedora is an open source community project, and nobody is interested
in violating user privacy. We do not want to collect data about
individual users. We want to collect only aggregate usage metrics that
are actually needed to achieve specific Fedora improvement objectives,
and no more. We understand that if we violate our users' trust, then
we won't have many users left, so if metrics collection is approved,
we will need to be very careful to roll this out in a way that
respects our users at all times. (For example, we should not collect
users' search queries, because that would be creepy.)

We believe an open source community can ethically collect limited
aggregate data on how its software is used without involving big data
companies or building creepy tracking profiles that are not in the
best interests of users. Users will have the option to disable data
upload before any data is sent for the first time. Our service will be
operated by Fedora on Fedora infrastructure, and will not depend on
Google Analytics or any other controversial third-party services. And
in contrast to proprietary software operating systems, you can
redirect the data collection to your own private metrics server
instead of Fedora's to see precisely what data is being collected from
you, because the server components are open source too.

Keep in mind this Fedora change proposal is just that: a proposal. It
must undergo community review and must be approved by the
community-elected Fedora Engineering Steering Committee (FESCo) before
it can be implemented, just like any other Fedora change proposal. We
welcome community participation and fully expect this proposal may
need to be modified significantly depending on Fedora community
feedback.

== Owner ==
* Name: [[User:catanzaro|Michael Catanzaro]]
* Email: <mcatanzaro@xxxxxxxxxx>

== Detailed Description ==

We intend to deploy the Endless OS metrics system.
[https://blogs.gnome.org/wjjt/2023/07/05/endless-oss-privacy-preserving-metrics-system/
This blog post] contains a description of how the system works. We do
not plan to deploy the eos-phone-home component in Fedora.

=== How will data collection be approved? ===

The proposal owners feel it is essential to ensure the Fedora
community has ultimate oversight over metrics collection. Community
control is required to maintain user trust. If this change proposal is
approved, then we'll need new policies and procedures to ensure
community oversight over metrics collection and ensure Fedora users
can be confident that our metrics collection does not violate their
privacy.

We can say "we would never collect personally-identifiable data" and
write software that really doesn't collect any such data, but this
alone will never be enough to ensure user confidence. We will need a
metrics collection policy that describes what sort of data may be
collected by Fedora (anonymous, non-invasive), and what sort of data
may not be collected. Such a policy does not exist currently. We will
also want to ensure the Fedora community has ultimate control over
which particular metrics are collected. One option is that each metric
to be collected should be separately approved by FESCo. Collection of
particular metrics in a particular data format is ultimately an
engineering decision, and therefore FESCo seems like an appropriate
approval point. Because FESCo members are elected regularly by the
Fedora community, this also provides the community with ultimate
control over metrics collection via the election process. But other
oversight and approval structures would work too.

=== What data might we collect? ===

We are not proposing to collect any of these particular metrics just
yet, because a process for Fedora community approval of metrics to be
collected does not yet exist. That said, in the interests of maximum
transparency, we wish to give you an idea of what sorts of metrics we
might propose to collect in the future.

One of the main goals of metrics collection is to analyze whether Red
Hat is achieving its goal to make Fedora Workstation the premier
developer platform for cloud software development. Accordingly, we
want to know things like which IDEs are most popular among our users,
and which runtimes are used to create containers using Toolbx.

Metrics can also be used to inform user interface design decisions.
For example, we want to collect the clickthrough rate of the
recommended software banners in GNOME Software to assess which banners
are actually useful to users. We also want to know how frequently
panels in gnome-control-center are visited to determine which panels
could be consolidated or removed, because there are other settings we
want to add, but our usability research indicates that the current
high quantity of settings panels already makes it difficult for users
to find commonly-used settings.

Metrics can help us understand the hardware we should be optimizing
Fedora for. For example, our boot performance on hard drives dropped
drastically when systemd-readahead was removed. Ubuntu has maintained
its own readahead implementation, but Fedora does not because we
assume that not many users use Fedora on hard drives. It would be nice
to collect a metric that indicates whether primary storage is a solid
state drive or a hard disk, so we can see actual hard drive usage
instead of guessing. We would also want to collect hardware
information that would be useful for collaboration with hardware
vendors (such as Lenovo), such as laptop model ID.

Other Fedora teams may have other metrics they wish to collect. For
example, Fedora localization wishes to count users of particular
locales to evaluate which locales are in poorer shape relative to
their usage.

This is only a small sample of what we might want to know; no doubt
other community members can think of many more interesting data points
to collect. But note the purpose of all of the above metrics is to
inform specific design decisions, not to build tracking profiles. We
only need to collect data in aggregate, and have no need to associate
the data we collect with particular users.

=== Metrics transparency ===

Transparency is required to provide confidence that Fedora metrics
collection is not creepy or invasive. Since Fedora is open source, a
developer can review the source code to verify exactly what it is
doing and what data is being collected. But most Fedora users are not
software developers, and few software developers have time or
inclination to review the source code of the operating system to see
what it is doing. To retain user trust, we need an easy way for users
to understand exactly what data we are collecting. We propose to
maintain a documentation page showing the current metrics database
schema, so users can see exactly which fields are in the database and
what example data looks like.

Experienced users may gain additional confidence by building and
running their own metrics collection server; all of the components of
the server (discussed below) are open source, and we will provide
instructions for how to run a simple server yourself and view its
metrics database. You can redirect metrics from Fedora's server to
your own by changing a URL in a configuration file.

=== User control ===

A new metrics collection setting will be added to the privacy page in
gnome-initial-setup and also to the privacy page in
gnome-control-center. This setting will be a toggle that will enable
or disable metrics collection for the entire system. We want to ensure
that metrics are never submitted to Fedora without the user's
knowledge and consent, so the underlying setting will be off by
default in order to ensure metrics upload is not unexpectedly turned
on when upgrading from an older version of Fedora. However, we also
want to ensure that the data we collect is meaningful, so
gnome-initial-setup will default to displaying the toggle as enabled,
even though the underlying setting will initially be disabled. (The
underlying setting will not actually be enabled until the user
finishes the privacy page, to ensure users have the opportunity to
disable the setting before any data is uploaded.) This is to ensure
the system is opt-out, not opt-in. This is essential because we know
that opt-in metrics are not very useful. Few users would opt in, and
these users would not be representative of Fedora users as a whole. We
are not interested in opt-in metrics.

To make this a little more confusing, metrics collection is actually
separate from uploading. Collection is always initially enabled, while
uploading is always initially disabled. The graphical toggle enables
or disables both at the same time. That is, a newly-installed Fedora
system will always collect metrics locally at first, but the collected
metrics will be deleted and never submitted to Fedora if the user
disables the metrics collection toggle on the privacy page. If the
user leaves the toggle enabled, then the collected metrics may be
submitted only after finishing the privacy page.

Metrics uploading will be opt-in for users who upgrade from previous
versions of Fedora Workstation, because we don't yet have a mechanism
to ask the user to consent to data collection after a system upgrade
like we do for new installations, but metrics collection will be
opt-out. That is, your upgraded system will collect metrics locally
but will never submit them to Fedora. If you visit the privacy page in
gnome-control-center, then both collection and uploading will be
either enabled or disabled depending on the user's selection. Unlike
gnome-initial-setup, the switch in gnome-control-center will default
to off if the user has not seen the switch in gnome-initial-setup and
has not previously selected a value for the setting.

This might sound complicated, but it is consistent. If the user has
not yet made a decision whether to allow telemetry, we collect it
locally so that it's ready to submit if the user approves telemetry in
the future, but we never upload it. Once the user makes a decision,
then we either upload it or delete it and stop collecting.

=== GDPR ===

It is Fedora Legal's obligation to ensure our data collection complies
with legal requirements in the jurisdictions in which Red Hat
operates. This is not an obligation of the Fedora community, so there
is no need to discuss GDPR rules on our mailing lists. The proposal
owners will not respond to mailing list posts that discuss GDPR or
similar legal obligations during this change proposal discussion. In
short, let's keep discussion focused on what Fedora SHOULD or SHOULD
NOT do, rather than what we MUST or MUST NOT do.

That said, Fedora Legal has determined that if we collect any
personally-identifiable data, the entire metrics system must be
opt-in. Since we are only interested in opt-out metrics due to the low
value of opt-in metrics, we must accordingly never collect any
personally-identifiable data. We must also not collect any data that
could become personally-identifiable if combined with other data,
which notably means IP addresses must not be stored. We only want to
collect anonymous data anyway, but we need to be especially mindful of
the possibility that combining two "anonymous" data points could
result in the data no longer being anonymous.

=== Fedora data collection policy ===

Fedora Legal requires that we publish a Fedora data collection policy
separate from the existing
[https://fedoraproject.org/wiki/Legal:PrivacyPolicy Fedora Privacy
Policy], which is designed to address usage of Fedora websites. This
is currently a work in progress that we're not quite ready to share
yet. You can expect it to be very short and very generic.

=== Metrics server infrastructure ===

We propose to deploy Azafea, the open source metrics collection server
used by Endless OS. An Azafea deployment consists of five components:
an nginx proxy server,
[https://github.com/endlessm/azafea-metrics-proxy
azafea-metrics-proxy], redis, [https://github.com/endlessm/azafea
azafea itself], and a Postgres database. nginx proxies HTTP requests
to azafea-metrics-proxy, which is itself a simple HTTP server that
adds metrics into the redis database, where they will be fetched by
Azafea and stored into Postgres. We will provide instructions on how
to set up your own server and see for yourself what data gets
collected.

=== Metrics client infrastructure ===

The client side consists of [https://github.com/endlessm/eos-metrics
eos-metrics], [https://github.com/endlessm/eos-event-recorder-daemon
eos-event-recorder-daemon], and
[https://github.com/endlessm/eos-metrics-instrumentation
eos-metrics-instrumentation]. eos-metrics is a D-Bus interface that
applications and services may use to record events, plus a GObject
library that provides a simple API around the D-Bus interface.
eos-event-recorder-daemon is the service that actually implements this
interface: it collects incoming metrics, batches them together, and
sends them to the metrics server at predefined intervals.
eos-metrics-instrumentation is the component that actually collects
specific metrics. Originally, we had planned to not use this component
and instead write our own fedora-metrics-instrumentation that would
collect only a few particular metrics that are approved via Fedora
community process. However, currently we are planning to ship
eos-metrics-instrumentation and instead ensure that it is not
collecting more metrics than would be acceptable to the Fedora
community. A review process to decide which metrics to collect and
which metrics to disable will be required.

=== Data set considerations ===

Although we assume the metrics server administrator is not malicious
and will not actively attempt to deanonymize users, we will still take
reasonable precautions to make it difficult to correlate metrics to a
particular user, starting by not storing any IP address information in
the metrics database. Additionally, each metric that we collect will
be considered individual, non-correlatable data by default, unless
approved to be correlated with particular other metrics via future
Fedora community process. That is, if a user submits two data points,
we usually don't want the ability to know that these data points were
both submitted by the same user.

Each metric is stored in the database with a Unix timestamp indicating
when it was generated on the client. If abused, this timestamp could
allow correlation of data points that are collected at the same time
as each other, or at a fixed time offset to other events. For example,
if the system were designed to collect two metrics exactly 300 seconds
after the system were booted, then just looking at the timestamps
would be enough to determine that both metrics recorded at the same
time were submitted by the same user. Accordingly, we should consider
modifying the metrics server to reduce timestamp granularity at least
somewhat.

=== History ===

Currently Fedora's only form of metrics collection is
[https://fedoraproject.org/wiki/Changes/DNF_Better_Counting DNF Better
Counting], but this only counts Fedora installations. That is useful,
but we want to count more than just how many users we have.

Fedora's first metrics collection attempt was
[https://en.wikipedia.org/wiki/Smolt_(Linux) Smolt], a precursor to
hw-probe which collected data on user hardware. The current proposal
is different from Smolt because it will collect more than just
hardware data, and also because Smolt collected only opt-in data. The
current proposal would be opt-out, not opt-in.

This change proposal will likely be compared to the Ubuntu spyware
complaints from a decade ago, when Ubuntu desktop users' search
queries were sent to Amazon by default. Let's not do that.

== Feedback ==

We will endeavor to update this section of the change proposal to
include a summary of Fedora community discussion of this proposal.

== Benefit to Fedora ==

The main benefit to Fedora is that we will be able to use collected
metrics to inform design decisions. It is very common for developers
to wish to know something about how Fedora software is used, and we
will finally have a way to answer such questions.

Occasionally, Red Hat might need to collect specific metrics to
justify additional time spent on contributing to Fedora or additional
investment in Fedora.

== Scope ==

* Proposal owners:
This change requires substantial technical and nontechnical work from
the change owners. Most notably, we will need to package eos-metrics,
eos-event-recorder-daemon, and eos-metrics-instrumentation properly
for Fedora; they are currently packaged in a copr. We also still need
to modify eos-metrics-instrumentation so that it does not send events
not approved for use in Fedora, as we expect to collect less data than
Endless OS.

* Other developers:
This proposal will require substantial effort by Community Platform
Engineering (CPE) to host the metrics server infrastructure.

* Release engineering: [https://pagure.io/releng/issues/11514 #11514]

* Policies and guidelines: New processes and guidelines are proposed
above under the section "How will data collection be approved?"

* Trademark approval: N/A (not needed for this Change)

* Alignment with Objectives: This change does not align with any
current [https://docs.fedoraproject.org/en-US/project/initiatives/
Fedora Initiatives], which are very limited in scope. That said, one
of the main purposes of metrics collection is to determine whether we
are achieving other objectives not listed on the wiki page. For
example, we want Fedora Workstation to become the premier developer
workstation operating system. To that end, we want to know how many of
our users are using particular IDEs.

== Upgrade/compatibility impact ==

We would like to enable metrics upload for upgraded systems, but this
isn't trivial because we want to obtain user consent before enabling
metrics upload. This would require us to design a user interface that
would run on upgraded systems and present the setting to users. We
have not yet created such a user interface, so for now metrics upload
will need to default to disabled for systems upgraded from older
versions of Fedora. Since the underlying setting will be off by
default, we don't need to do anything special to achieve this.

== How To Test ==

The ultimate goal is to see metrics appear in the Postgres database of
a metrics server, but configuring and running the server is not
trivial. Accordingly, we propose to publish a separate document
detailing how to set up and configure a metrics server for testing
purposes, how to redirect metrics to the custom server, and how to
force the client to immediately submit metrics to ease testing.
Although we don't actually expect many community members to seriously
run their own metrics servers, we still want to document the steps
involved so that interested developers can see exactly how it works.

== User Experience ==

A new metrics collection setting will be added to the privacy page in
gnome-initial-setup and also to the privacy page in
gnome-control-center. This setting will be a simple toggle that will
enable or disable all metrics upload for the entire system. Users who
do not want any metrics upload should feel confident that uploading
can be disabled with a simple toggle.

Fedora users should be confident that Fedora metrics collection
respects their privacy and collects only limited, anonymous usage
data.

== Dependencies ==

Any package that wishes to collect a metric would need to depend on
eos-metrics. For example, if we were to collect statistics on which
system settings panels are used most frequently, then the
gnome-control-center package would need to depend on eos-metrics in
order to send a metric to eos-event-recorder-daemon.

== Contingency Plan ==

* Contingency mechanism: We would need to remove the eos-metrics,
eos-event-recorder-daemon, and eos-metrics-instrumentation packages
from the workstation-product comps group, and rebuild any packages
that gained a dependency on eos-metrics.
* Contingency deadline: Beta freeze
* Blocks release? Yes, if the change is incomplete, it will need to be
reverted before release.

== Documentation ==

This feature will depend on several different upstream projects with
varying amounts of documentation.

The client side consists of eos-metrics, eos-event-recorder-daemon,
and eos-metrics-instrumentation. The best documentation of eos-metrics
available online is its
[https://github.com/endlessm/eos-metrics/blob/master/data/com.endlessm.Metrics.xml
D-Bus interface XML]. eos-metrics also contains normal API
documentation that will be built and installed in a docs subpackage,
but this is not currently available online. The
eos-event-recorder-daemon and eos-metrics-instrumentation components
do not appear to have any online documentation.

On the server end, the metrics server consists of azafea-metrics-proxy
feeding metrics into redis, where they will be pulled by azafea and
then added to a Postgres database. Documentation for
[https://github.com/endlessm/azafea-metrics-proxy/tree/master/docs/source
azafea-metrics-proxy] and
[https://github.com/endlessm/azafea/tree/master/docs/source azafea]
can be reviewed online.
[https://azafea.readthedocs.io/en/latest/events.html Events recognized
by the server are documented here.] Note that this documentation is
currently focused on use by Endless OS rather than by Fedora, and
includes documentation of many events that are no longer sent by
Endless OS. This change proposal does not propose to enable sending
any particular events in Fedora.

== Release Notes ==

Release Notes are not required for initial proposal. We need to write
the release notes before change freeze.



-- 
Aoife Moloney

Product Owner

Community Platform Engineering Team

Red Hat EMEA

Communications House

Cork Road

Waterford
_______________________________________________
devel-announce mailing list -- devel-announce@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-announce-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel-announce@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux