Re: Future of fedora-packages

Clement Verna <cverna@xxxxxxxxxxxxxxxxx> · Thu, 28 Feb 2019 08:59:14 +0100

On Thu, 28 Feb 2019 at 00:06, Stephen John Smoogen <smooge@xxxxxxxxx> wrote:
>
> On Wed, 27 Feb 2019 at 16:05, Jim Perrin <jperrin@xxxxxxxxxx> wrote:
> >
> > How much heresy is involved in us using Amazon's elasticsearch service
> > for this, so that we don't have yet-another-thing to maintain?
> >
>
> I was wondering how much data are we looking to shove there, does that
> data need to be 'protected', and how fast do we need it to be for us
> to talk back and forth to the cloud. The heresy side I don't have any
> say in..

For fedora-packages we want to store documents that contains packages
informations (see the current structure used
https://github.com/fedora-infra/fedora-packages/blob/master/fedoracommunity/search/index.py#L241).
Currently in production we have 23849 documents in the xapian database
so I honestly don't think that will be much trouble for elasticsearch.
Writing to the cluster should be restricted and I think the search
service should be public, elasticsearch provides Security Privileges
(https://www.elastic.co/guide/en/x-pack/current/security-privileges.html)
that seems to fit with that idea.

Indexing does not have to be crazy fast, for example currently
fedora-packages indexing takes between 2 to 3 hours so I don't think
network latency will matter much here. Searching is a bit more
sensitive since users usually don't want to wait more than a seconds
or so to get a search results but if we use the elasticsearch
javascript library
(https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/index.html)
and handle the search in the frontend then it does not have to go via
our infrastructure.

>
>
> > On 2/27/19 4:19 AM, Stephen John Smoogen wrote:
> > > On Tue, 26 Feb 2019 at 14:39, Clement Verna <cverna@xxxxxxxxxxxxxxxxx> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> fedora-packages [0] code base is showing its age. The code base and
> > >> the technology stack  (Turbogears2 [1] web framework and the Moksha
> > >> [2] middleware) is currently not ready for Python3 and I am not
> > >> planning to do the work required to make it Python3 compatible, so the
> > >> application will stop working when Fedora 29 is EOL.
> > >>
> > >> In order to keep the service running, I have started a Proof Of
> > >> Concept (fedora-search [3]) to replace the backend of the application.
> > >> Fedora-search would be a REST API service offering full test search
> > >> API. Such a service would then be available for other application to
> > >> use, fedora-packages would then become a frontend only application
> > >> using the service provided by fedora-search.
> > >>
> > >> While the POC shows that this is a viable solution, I don't think that
> > >> we should be proceeding that way, for the simple reason that this add
> > >> yet another code base to maintain, I think we should use this
> > >> opportunity to consider using Elasticsearch instead of maintaining our
> > >> own "search engine".
> > >>
> > >
> > > The main issues to getting elasticsearch working in the past was the following:
> > >
> > > 1 The number of systems needed to make it work. There is a large
> > > difference from their 'proof-of-concept see how great this is' to 'ok
> > > you want to do anything with load' setups in everything from storage
> > > to number of search nodes to network speeds. [The number of hardware
> > > for the data we have was to start with 5-8 dedicated Dell systems,
> > > some amount of shared fast storage, and N virtual machines with a
> > > 10-40GB backbone.. or throwing all of Fedora Infrastructure at once
> > > into the cloud.. because the feed it from PHX2 to the cloud is
> > > expensive.]
> > >
> > > 2. Packaging of elasticsearch was a mess. At the time we had rules
> > > that all packages needed to be packaged in Fedora and follow Fedora
> > > packaging rules. [This one has been relaxed.]
> > >
> > > 3. Running of elasticsearch was a large service in itself. It doesn't
> > > take care of itself and we would need one or more people who know it
> > > well to keep it running. [This goes down the ladder.. the logstash
> > > backends are also full services.. ] Most of that was written in Java
> > > which no one on the team at the time had good experiences with.
> > >
> > > 4. A kibana/elasticsearch query expert. Just like any database, most
> > > of the queries you can make are the worse kind. They will take a lot
> > > more CPU/memory/time than they should making just grepping for data
> > > faster.
> > >
> > > However that is 3-5 years ago.. so a lot has changed since then.
> > >
> > >
> > >> I think that Elasticsearch offers quite a few advantages :
> > >>   - Powerful Query language
> > >>   - Python bindings
> > >>   - Javascript bindings
> > >>   - Can be deployed in our infrastructure or used as a service
> > >>   - Can be useful for other applications ( docs.fp.o, pagure, ??)
> > >>
> > >> So what is the general feeling about using Elasticsearch in our
> > >> infrastructure ? Should we look at deploying a cluster in our infra /
> > >> Should we approach the Council to see if we can get founding to have
> > >> this service hosted by Elastic ?
> > >>
> > >> Thanks
> > >> Clément
> > >>
> > >> [0] - https://apps.fedoraproject.org/packages/
> > >> [1] - http://www.turbogears.org/
> > >> [2] - https://mokshaproject.github.io/mokshaproject.net/
> > >> [3] - https://github.com/fedora-infra/fedora-search
> > >> _______________________________________________
> > >> infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> > >> To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
> > >> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> > >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > >> List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> > >
> > >
> > >
> > _______________________________________________
> > infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> > To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
> > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
>
>
>
> --
> Stephen J Smoogen.
> _______________________________________________
> infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx