Introducing the ARC sub-team in CPE - and first research topic

Pierre-Yves Chibon <pingou@xxxxxxxxxxxx> · Mon, 18 Jan 2021 16:25:09 +0100

Good Morning Everyone,

While planning work, the CPE team has realized that a number of our initiatives
actually start with a research phase to find the most appropriate technical
solution.
This leads to some issues with planning as without knowing the technical
solution we want to take, it's hard to evaluate the amount of work needed and
thus the time it'll take to do it.

In order to help with this, we're creating a small sub-team in CPE, called the
ARC team for Advance Reconaissance Crew*.
The goal of this team will be to investigate what we believe to be the possible
technical solutions for initiatives and advise the team on what they believe
would be the appropriate solution.
To this end, we will reach out when we start looking for ideas as you may have
ideas that we did not think about.

The first investigation, led by Will Woods, Mark O'Brien and I, will be around
datanommer and datagrepper.

datanommer is an application listening to fedmsg and filling a (postgresql)
database with all the messages passing on the bus.
datagrepper is a web application exposing these messages and offering a way to
filter or search them.
    available at: https://apps.fedoraproject.org/datagrepper/

Currently our ideas are:
- for datanommer:
    - port it to fedora-messaging
    - adjust it to whichever solution we chose to replace datagrepper

- for datagrepper:
    - keep it as is
    - Replace by
        - postgres https://postgrest.org/
        - prest https://github.com/prest/prest
        - kinto https://docs.kinto-storage.org/en/stable/
        - Swagger/OpenAPI https://swagger.io/
    - Add support for Graphql

- for the postgresql server
    - Split messages per year in different table
        - Unite them using a postgresql view
    - Kick out the old messages per year
        - Keep the current year + n-1 in the current DB
        - Kick the other to another DB?
        - Kick the other to a tarball somewhere?
        - Output the database daily dump to file / year
    - TimescaleDB a postgresql plugin for time-series data
        - https://alibaba-cloud.medium.com/postgresql-time-series-database-plug-in-timescaledb-deployment-practices-6a07e246eb0d
        - https://dev.t-matix.com/blog/postgresql-as-a-time-series-database/
        - https://docs.timescale.com/latest/introduction
    - Make the msg field in the message table be a JSON field

Would you have any other ideas of things we could look at?

Looking forward for your input,

Thanks,
Pierre, Will and Mark

* Our notes and documentation are hosted at:
  https://fedora-arc.readthedocs.io/en/latest/index.html
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx