Re: The packages app has a short runway

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

> How does the indexing works ?

You point Yacy to a domain or list of URLs
(https://fnux.fedorapeople.org/pkgs/ in this case), and it takes care of
everything. There is also an advanced crawler panel in the UI allowing
you to filter content (e.g. HTML classes) from pages, which would be
useful if we do not want to index everything (e.g. dependencies).

I am not familiar with the maths used by Yacy for indexing.

> And what would it take to add more info for each package ?

I wrote a quick script (https://paste.gnugen.ch/raw/4JAC) fetching
package metadata from PDC+mdapi for testing, but it is ways too slow to
scale to the whole package set.

MDAPI will have to be replaced by local SQLite to increase performance.
I think we could generate most of the content from the repositories'
metadata (last N Fedora + EPEL) but I need to find where the SQL files
lives. A privileged endpoint to dist-get to fetch the package ->
maintainer mapping bypassing pagination would be convenient.

We can use Yacy's JSON API to build a sexy fedora-branded search page
but I think it's a late-stage optimization.

-- 
Timothée

Attachment: signature.asc
Description: PGP signature

_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx

[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux