Request for Feedback/Resources for Packager Dashboard/Oraculum

Frantisek Zatloukal <fzatlouk@xxxxxxxxxx> · Tue, 10 Nov 2020 16:57:20 +0100

Hello,

many of you already heard about Fedora Packager Dashboard [0]. In short, for those who have not, it's a web application and backend aiming to make the life of Fedora packagers easier. It combines data from multiple sources (Pagure, Bugzilla, Bodhi, Koschei,...) relevant to the maintainers of Fedora packages.

Tracking all these sites can be time consuming, especially if you maintain dozens of different packages, so the Dashboard provides everything a packager might need (or at least what we've thought of) - condensed, cached, searchable and filterable on one page.

You can check it out/play with it even if you don't maintain any Fedora packages, there is no authentication needed, just enter any packager’s username. Feature-wise, it's pretty robust already, and there are more things to come (like allowing users to authenticate and see private bugs and more), but the original planned feature set is complete.

Currently, it's processing only publicly available data. When we'll start implementing the ability to authenticate and process any private data, we'll work closely with the Infra team to make sure there are no open holes.

Since the announcement of the testing phase, it has been running on a temporary server which is barely keeping up, so I'd like to open up a conversation about migration to the Infra OpenShift cluster.

Here is a brief overview of it's internals (I can elaborate more if anybody needs me to):

Backend is a Flask/Python application leveraging Celery for planning and executing the cache refreshes. The API is striving to be as non-blocking as possible, using asynchronous-inspired behaviour. The client is given the data currently available in cache, and advised about the completeness (complete/partial cache misses) via HTTP status code.
The parameters for cache refreshes can be customized, depending on the resources we have/can get. Currently, it's retrieving data for most of the items every 2 hours (with exceptions like package versions which run daily and are terribly slow). Backend is caching data for PRs, bugs, and pre-calculated updates/overrides data for users visiting the app at least once in two weeks. The main storage is a PostgreSQL database, and optionally, if RAM is not an issue, we have a local memory-cached layer that can be enabled (we find it not necessary ATM).
Apart from storing the pre-crunched information from public sources, we keep timestamps of the last visit for each user.

Frontend is a React app fetching and displaying data from the backend (really nothing to add here :) ).

Based on the OpenShift testing, I've come to the following schema of pods:
 Redis pod (Celery backend)
PostgreSQL pod (cache and watchdog data storage)
Beat pod (scheduling tasks for data refresh)
Flower pod (backend monitoring; nice to have, not absolutely necessary)
Gunicorn/NGINX pod (Oraculum backend)
NGINX pod (Packager Dashboard front-end). 

On top of that, we need a number of worker pods completing the scheduled Celery tasks.
I am not sure how much RAM each pod in Infra OpenShift has, currently the workers seem to be performing best with about 512 MB memory limit. Ideally we’d like to have at least 12-16 Celery workers (more workers can, of course, run on a single pod).

Resource-wise it's not a small application (at least from my perspective :D ), but we believe it's a great value application which saves time for Red Hat and community package maintainers.

I'd like to hear your feedback, questions, requests for changes if there is anything preventing it from deployment in Infra OpenShift (architecture/code wise). And of course opinions on the feasibility of moving Oraculum and Packager Dashboard into the Infra OpenShift cluster.

Thanks!

[0] https://packager.fedorainfracloud.org/

_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx