F42 Change Proposal: Opt-In Metrics for Fedora Workstation (system-wide)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wiki - https://fedoraproject.org/wiki/Changes/Metrics
Discussion thread -
https://discussion.fedoraproject.org/t/f42-change-proposal-opt-in-metrics-for-fedora-workstation-system-wide/124325

== Summary ==

The goal of this change proposal is to provide the Fedora community
with accurate, representative data about the real world use of Fedora
Workstation. By doing this, we believe that we can accelerate the
development of Fedora Workstation, and ensure that it improves in line
with our users’ needs and requirements.

'''Protecting user privacy is of utmost importance for this
initiative.''' To this end, the service will only collect generic,
standardized data, and will never collect anything that is personally
identifying. It will also, of course, be fully open source. On the
server side, the data will be stored in a way that prevents user
identification.

'''Another important aspect of the initiative is that it will be run
in a transparent manner, and will be governed as part of the Fedora
project.''' A new SIG will be responsible for the service, and will be
open to community participation. It will publish analyses of the data
which has been collected, provide documentation about how the service
operates, will share samples of the database data, and will respond to
requests from the community.

'''Finally, we intend to ensure that metrics reporting is fully under
the control of end users.'''  Metrics collection will default to off,
and will only be enabled through a clear on/off prompt in initial
setup. Users will be able to view the data that has been collected
locally, and will be able to remove the client software from their
systems, should they choose to do so.

To address concerns that the community might have, the change owners
have created a [https://pagure.io/fedora-workstation/blob/master/f/notes/metrics-privacy-transparency.md
privacy and transparency checklist], which will be updated as the
initiative progresses.

== Owners ==

* Name: [[User:aday|Allan Day]]
* Name: [[User:catanzaro|Michael Catanzaro]]
* Name: [https://docs.fedoraproject.org/en-US/workstation-working-group/
Fedora Workstation Working Group]

== Current status ==

The proposal is to deploy a pre-existing data collection system -
called Azafea - for Fedora Workstation. Azafea has both client and
server components. Significant work is required to make a wide scale
deployment of Azafea possible (see scope section below).

This updated proposal obsoletes [[Changes/Telemetry | the original proposal]].

* Targeted release: [[Releases/42 | Fedora Linux 42]]


== Detailed Description ==

This section includes a detailed description of each aspect of the
metrics proposal.

=== Data that will be collected ===

All collected data will be anonymous:

* We will not collect identifying information, such as email
addresses, online account details, and IP addresses.
* We will only collect generic, standardized information. For example,
we want to collect data on which apps are used, but we will never
collect data on which websites are viewed or which files are opened.
* Server side, each metric will be stored separately and will not be
linked to other metrics from the same system. This will prevent user
fingerprinting through the cross-referencing of anonymous information.

All of the code in the data collection system will be open source and
available for public inspection.

The data we plan on collecting will fall into the following categories:

{| class="wikitable" style="margin:auto"
|-
! Category !! Examples
|-
| Hardware details || CPU, graphics, cameras, which peripherals are present.
|-
| System settings || The display language, which input methods are
used, which accessibility features are enabled.
|-
| Desktop usage patterns || Which apps are used, how many open
workspaces there are, how often each system settings panel is opened.
|-
| Performance reports || Disk and memory usage.
|-
| Evidence of problems || Counts of system crashes, OOM events, app crashes.
|}

For more detailed information, see the
[https://pagure.io/fedora-workstation/blob/master/f/notes/metrics-to-collect.md
preliminary list of metrics that we want to collect]. This list
indicates the purpose of each metric that we hope to collect.

=== Steps to ensure anonymity ===

The [https://pagure.io/fedora-workstation/blob/master/f/notes/metrics-to-collect.md
metrics that we hope to collect] are all generic in nature, and do not
contain personal or identifying information.

To prevent accidental collection of identifying information, the data
we collect will be filtered on the client side, so that only known,
standardized variables are included. For example, when recording which
apps are used, we will only record known package names, in order to
prevent custom apps with identifying metadata from being recorded.

Wherever possible, the system will aggregate data locally prior to
upload. For example, it can report the number of times that a feature
was used in a week, instead of the exact time whenever it is used.
This method further increases anonymity by reducing the precision of
the data that is reported.

We will only deploy the service once it has undergone a thorough
period of testing, during which we will verify that the database is
only being populated with anonymous data. (Data from the testing phase
of the system will be permanently deleted.)

=== How metrics data will be used ===

We anticipate that the data we collect will drive myriad improvements
within Fedora as well as the wider ecosystem. These improvements
include:

'''Resource prioritization''' - knowing which hardware, features and
apps are used most will allow developers and partners to focus their
efforts where they will have the most impact.

'''Software improvements''' - data about usage and performance
patterns can drive optimisations in existing software, in terms of
both technical and UX design.

'''Configuration enhancements''' - decisions about default settings
and the default composition of Fedora Workstation can be based on
observed usage patterns.

'''Better development practices''' - we aim to promote and encourage
user and data driven development practices through this work.

To achieve these impacts, analysis of the collected data will be
published and circulated to the relevant developers and projects.

=== Who will have access to metrics data ===

In the interests of transparency, we will put the following mechanisms
in place for viewing the data that is collected:

# Raw data from the database will be published during the testing
phase, prior to wide scale deployment
# Members of the community will be able to join the metrics SIG, in
order to get full ongoing access to the data
# After deployment, a randomly selected sample of the database will be
published (once it has been manually checked)
# Members of the community will be able to request the SIG for copies
of the database, which will be shared privately

This proposal is an attempt to balance the need to protect privacy
with the need to provide transparency. We have a high degree of
confidence that the database will only contain anonymous data (see
“how will we ensure that the system only collects anonymous, generic
data?”). However, there is always some risk that something could go
wrong with data collection. Out of an abundance of caution, we
therefore only want to share data once it has been manually checked.

=== Approval for changes to the metrics system ===

Any changes to the metrics system and its governance arrangements will
require approval by FESCo. This will include any changes to the:

* metrics data that is collected
* the metrics SIG (its rules, role, composition, membership terms)
* the technology used
* changes to the UI for user opt in/opt out
* hosting of the infrastructure or involvement of 3rd parties

=== User control ===

The proposed system aims to ensure that users are always in control of
metrics collection on their systems. This will be achieved through the
following:

* The setting for metrics collection will enabled/disable both local
metrics collection and data upload
* Metrics collection will be off by default
* Metrics collection will only be enabled through an explicit opt in
from the user, which will be presented as part of initial setup
* It will always be possible for users to disable metrics collection
from the system settings
* It will be possible for users to view the metrics that have been
collected locally on their systems
* It will be possible for users to remove the metrics collection
components from their systems, using dnf

=== Metrics system components ===

The metrics system would be composed of server and client Azafea components.

An Azafea server deployment consists of five components: 1. an nginx
proxy server, 2. azafea-metrics-proxy, 3. redis, 4. azafea itself, 5.
a Postgres database

nginx proxies HTTP requests to azafea-metrics-proxy, which is itself a
simple HTTP server that adds batches of metrics into the redis
database, where they will be fetched by Azafea and stored into
Postgres.

The client side consists of the following components:

* eos-metrics - a D-Bus interface that applications and services may
use to record events, plus a GObject library that provides a simple
API around the D-Bus interface
* eos-event-recorder-daemon -  the service that actually implements
the D-Bus interface: it collects metrics recorded via D-Bus, batches
them together, and sends them to the metrics server at predefined
intervals
* eos-metrics-instrumentation - the component that calls D-Bus methods
on eos-event-recorder

== Feedback ==

The [[Changes/Telemetry|initial version of this proposal]] generated a
huge amount of feedback and debate. We have put a lot of time and
effort into engaging with this feedback, and the proposal has been
substantially changed in response to it. We are grateful to the Fedora
community for enabling us to improve the proposal in this way.

We know that there were issues with the original proposal, and that
these led to serious concerns amongst the community. We hope that the
updated proposal addresses these concerns, and look forward to
receiving further feedback.

The following is a summary of the key points from the discussion so
far, along with details of the steps that have been taken in response
to them. Additional information is also included in the
[https://pagure.io/fedora-workstation/blob/master/f/notes/metrics-faq.md
FAQ] If we have missed something from that discussion, please let us
know.

=== Opt in or opt out? ===

The original proposal specified that metrics upload would be disabled
by default, and that the UI setup would include an on by default
switch to allow users to opt out. This aspect of the proposal
attracted by far the most negative feedback.

As a result of this feedback, we have changed the proposal: we now
propose that initial setup will show an explicit yes/no prompt which
has no default value.

We recognise that feedback about the opt-out UI reflected wider
concerns about the privacy and transparency of the metrics system,
which we have addressed through other changes.

=== Proposal omissions ===

We received feedback that the original proposal omitted key details
from the proposal, including:

* The benefit to Fedora
* Which metrics will be collected
* That each metric will be stored separately and will not be correlated
* How members of the community will be able to access the database
* Whether users will be able to view the local data that has been
collected on their systems
* That the metrics packages can be removed using DNF
* The policy through which the collection of specific metrics will be approved

This information has now been added to the proposal.

=== Ability to view the entire data set ===

This was a frequent request in the feedback we received. We understand
the motivation to have transparency and to verify what data is being
collected.

“Who will have access to the data?” contains an updated proposal which
we hope will satisfy this desire while also preventing potential
privacy issues.

=== Risks to anonymity if the metrics server is hacked ===

This was another major subject of discussion, with various concerns
being raised.

We are confident that it will not be possible for the administrators
of the metrics system to identify or fingerprint users under normal
operation of the metrics server. We also want to emphasize the generic
nature of the metrics we want to collect.

We have also committed to:

* Take steps to minimize risks, such as having short retention of server logs
* Manage the server through the metrics SIG, so that members of the
community can contribute their expertise
* Document the infrastructure setup for the metrics server once it has
been setup, in order to solicit further feedback

These points have been added to our
[https://pagure.io/fedora-workstation/blob/master/f/notes/metrics-privacy-transparency.md
privacy and transparency checklist].

The metrics server will not store IP addresses or entire batches of
metrics data. However, we acknowledge that, if Fedora infrastructure
is compromised, an attacker could begin recording this information. We
acknowledge this as a risk of the system.

=== Local data collection ===

The original proposal specified that local data collection would
default to on, while upload of that data would default to off. Some
pointed out that this would be a privacy risk.

In the new version of the proposal, local data collection will only be
enabled after the user has consented to metrics collection.

=== Other suggestions===

We received various other suggestions during the debate about the
original change proposal. These included:

==== Provide fine-grained user control over which data is uploaded ====

This would add complexity to the system and to data analysis. We are
also unsure how much these fine-grained controls would be used in
practice. This is not something that we are rejecting outright, but it
is unlikely something that we ourselves would be able to add to the
initial version of the system.

==== Only collect some metrics for a fixed time period ====

We agree that this makes sense for some metrics and we have added this
to our [https://pagure.io/fedora-workstation/blob/master/f/notes/metrics-privacy-transparency.md
privacy and transparency checklist], as a future work item.

==== Restrict metrics collection to a small sample of users ====

The main issues with this approach would be ensuring that the sample
is representative, and our ability to detect issues experienced by
subsets of the user base.

==== Collaborate with a trusted third party ====

The idea behind this suggestion was for us to get additional oversight
and input from an organization that has expertise in data privacy
issues. We’d be very happy to do this, but are unsure who that third
party would be. We are open to suggestions!

==== Adopt differential privacy techniques ====

[https://desfontain.es/blog/friendly-intro-to-differential-privacy.html
Differential privacy] would potentially allow Fedora systems to submit
inaccurate data to the metrics server, while ensuring the overall data
set is still representative and useful. We would welcome collaboration
from Fedora community members interested in improving the metrics
collection system to adopt such techniques.

== Benefit to Fedora ==

See “What will the data be used for?”

== Scope ==

* Proposal owners: this change requires substantial technical and
nontechnical work from the change owners. This will include:
** Properly packaging eos-metrics, eos-event-recorder-daemon, and
eos-metrics-instrumentation for Fedora
** Modifying eos-metrics-instrumentation so that it does not send
events that are not approved for use in Fedora
** Creation of the metrics SIG and its various policies and procedures
** Documentation for end users and members of the community
* Other developers: Community Platform Engineering (CPE) will need to
host the metrics server infrastructure.
* Release engineering: [https://pagure.io/releng/issues/11514 #11514]
* Policies and guidelines: see "How will data collection be approved?"
* Trademark approval: N/A (not needed for this change)
* Alignment with objectives: there are currently no
[https://docs.fedoraproject.org/en-US/project/initiatives/ Fedora
Initiatives]. However, the generated data will be broadly applicable
to Fedora community activities.

== Upgrade/Compatibility Impact ==

There are no special technical challenges in this regard.

Metrics collection will only be enabled in response to an explicit
opt-in by the user, through a UI in either gnome-initial-setup or
gnome-control-center. gnome-initial-setup is only shown for new
installs, meaning that the only way to enable metrics on an upgraded
system would be through gnome-control-center.

== How to Test ==

Testing is not currently possible. Instructions will be provided when
this changes.

== User Experience==

The user experience for the system will consist of:

# In initial setup, a UI to choose between metrics collection being on
or off. There will be no default in the UI and users will have to
explicitly choose one of the two options.
# In the privacy Settings, a switch to turn metrics collection on or off
# User documentation about the service
# A method to view locally collected metrics data

== Dependencies ==

Packages wanting to collect metrics data will need to depend on
eos-metrics. For example, to collect statistics about Settings usage,
the gnome-control-center package would need to depend on eos-metrics
in order to send a metric to eos-event-recorder-daemon.

== Contingency Plan ==

* Contingency mechanism: remove the eos-metrics,
eos-event-recorder-daemon, and eos-metrics-instrumentation packages
from the workstation-product comps group, and rebuild any packages
that gained a dependency on eos-metrics.
* Contingency deadline: beta freeze
* Blocks release? If the change is incomplete, it will need to be
reverted before release.

== Documentation ==

This feature depends on several different upstream projects, each of
which have their own documentation.

Client side components:

* eos-metrics has online docs at
[https://github.com/endlessm/eos-metrics/blob/master/data/com.endlessm.Metrics.xml
D-Bus interface XML]. API documentation is also built and installed in
a docs subpackage.
* eos-event-recorder-daemon and eos-metrics-instrumentation components
do not have online documentation at this time.

Server-side documentation:

* [https://github.com/endlessm/azafea-metrics-proxy/tree/master/docs/source
Azafea-metrics-proxy]
* [https://github.com/endlessm/azafea/tree/master/docs/source Azafea]
* [https://azafea.readthedocs.io/en/latest/events.html Events
recognized by the server] (this documentation is currently focused on
use by Endless OS rather than by Fedora, and includes documentation of
many events that are no longer sent by Endless OS)

== Release Notes ==

These will be provided if the proposal is approved and successfully implemented.

-- 

Aoife Moloney

Fedora Operations Architect

Fedora Project

Matrix: @amoloney:fedora.im

IRC: amoloney

-- 
_______________________________________________
devel-announce mailing list -- devel-announce@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-announce-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel-announce@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux