Re: Introducing the ARC sub-team in CPE - and first research topic

Pierre-Yves Chibon <pingou@xxxxxxxxxxxx> · Fri, 26 Feb 2021 18:28:22 +0100



On Mon, Jan 18, 2021 at 04:25:09PM +0100, Pierre-Yves Chibon wrote:
> Good Morning Everyone,
> 
> While planning work, the CPE team has realized that a number of our initiatives
> actually start with a research phase to find the most appropriate technical
> solution.
> This leads to some issues with planning as without knowing the technical
> solution we want to take, it's hard to evaluate the amount of work needed and
> thus the time it'll take to do it.
> 
> In order to help with this, we're creating a small sub-team in CPE, called the
> ARC team for Advance Reconaissance Crew*.
> The goal of this team will be to investigate what we believe to be the possible
> technical solutions for initiatives and advise the team on what they believe
> would be the appropriate solution.
> To this end, we will reach out when we start looking for ideas as you may have
> ideas that we did not think about.
> 
> The first investigation, led by Will Woods, Mark O'Brien and I, will be around
> datanommer and datagrepper.
> 
> datanommer is an application listening to fedmsg and filling a (postgresql)
> database with all the messages passing on the bus.
> datagrepper is a web application exposing these messages and offering a way to
> filter or search them.
>     available at: https://apps.fedoraproject.org/datagrepper/
> 
> Currently our ideas are:
> - for datanommer:
>     - port it to fedora-messaging
>     - adjust it to whichever solution we chose to replace datagrepper
> 
> - for datagrepper:
>     - keep it as is
>     - Replace by
>         - postgres https://postgrest.org/
>         - prest https://github.com/prest/prest
>         - kinto https://docs.kinto-storage.org/en/stable/
>         - Swagger/OpenAPI https://swagger.io/
>     - Add support for Graphql
> 
> - for the postgresql server
>     - Split messages per year in different table
>         - Unite them using a postgresql view
>     - Kick out the old messages per year
>         - Keep the current year + n-1 in the current DB
>         - Kick the other to another DB?
>         - Kick the other to a tarball somewhere?
>         - Output the database daily dump to file / year
>     - TimescaleDB a postgresql plugin for time-series data
>         - https://alibaba-cloud.medium.com/postgresql-time-series-database-plug-in-timescaledb-deployment-practices-6a07e246eb0d
>         - https://dev.t-matix.com/blog/postgresql-as-a-time-series-database/
>         - https://docs.timescale.com/latest/introduction
>     - Make the msg field in the message table be a JSON field
> 
> Would you have any other ideas of things we could look at?

Just as a follow up to this thread, our findings can be found at: 
https://fedora-arc.readthedocs.io/en/latest/datanommer_datagrepper/index.html
and I've also presented them in a blog post at:
http://blog.pingoured.fr/index.php?post/2021/02/26/datanommer/datagrepper-investigations


Hoping this helps,
Pierre
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure