Hello Fedora developers, I'd like to show you a proposal for a new XML format of modular metadata which reside in YUM repositories. In short I propose replacing YAML syntax with XML syntax while removing features which where never implemented or used, while providing a detailed specification leaving small place for implementer's invention. The proposed specification is the "reduced" variant under <https://github.com/fedora-modularity/libmodulemd/tree/main/xml_specs>, for instance <https://github.com/fedora-modularity/libmodulemd/blob/main/xml_specs/reduced/overview.xml>. Bear in mind that this change is only about how the modules are stored in YUM repositories which are fetched by DNF. It does not change how modules are defined by module maintainers (YAML modulemd-packager-v3 or modulemd-v2 format) and how it is built by MBS and handled by Bodhi. Those who should be concerned most are DNF5 developers and relengs producing composes. Long story: Original modulemd format had a noble property, and that was an input format for MBS is the same as the output format. This is not true anymore because of modulemd-packager-v3 format. It also makes validation difficult as fields optional in an input format are mandatory in the output format, or vice versa. Original modulemd format drags in YAML format into YUM repository which is otherwise XML-only. That requires a YAML parser. Original modulemd format is not handled by DNF directly. Instead, DNF uses libmodulemd library. That library is heavily based on glib. In fact it embeds glib types into its API. Why do I mention it? Because new DNF5 aims to eradicate glib. Mostly to shrink container installations. librepo and libmodulemd are the last pieces with glib. Because it's impossible to remove glib from libmodulemd, there has to be a new library for parsing modular metadata. If there has to be a new library, there could be a transition from YAML to XML which would shrink the minimal installation more by removing libyaml. Original modulemd format possesses some features which nobody uses, or nobody implements, or if implements, than not fully. Do you remember a deprecation of intents from modularity <https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx/thread/RXDP2WMPR3HHBRTQAKPSTRU6KABTJSMA/#RXDP2WMPR3HHBRTQAKPSTRU6KABTJSMA>? There are more things that can be removed and make the format and its parser simpler. Original format is not well specified. DNF and Satellite people complained a lot when they were implementing it. The specification looks more like an example. E.g. a module stream name is probably a string. An arbitrary string. With spaces, with new lines. I think you do not want to see a stream named " :\n". Well, DNF does not even allow you to identify a module like that. There is definitely room for tightening the format. But each change like that is technically an incompatible change. To materialze the change we need at least a new modulemd format version. But if we need a new format version, we can actually come a completely new format. As you can see, there are good reasons to come up with a new in-repository format. Hence here it is <https://github.com/fedora-modularity/libmodulemd/tree/main/xml_specs>. I originally developed the XML format to be able to encode all features we have in the old YAML format. That's kept for your reference in "complete" subdirectory <https://github.com/fedora-modularity/libmodulemd/tree/main/xml_specs/complete>. Then I removed all unnecessary features and put it into "reduced" subdirectory <https://github.com/fedora-modularity/libmodulemd/tree/main/xml_specs/reduced>. If you are interested in it, I recommend starting with overview.xml file. It shows a skeleton of the format. It's so small I can quote it here: <index xmlns="http://fedoraproject.org/metadata/moduleindex" version="" revision=""> <module name=""> <stream name=""> <!-- DNF wants versions and contexts to differ in @summary etc. --> <build version="" context="" static="" arch="" summary="" description=""> <!-- @static defaults to false. --> <dependency name=""> <requires></requires> <!-- Only one for modulemd-packager-v3 --> <conflicts></conflicts> <!-- Not supported by modulemd-packager-v3 --> </dependency> <dependency name=""/> <!-- An unspecified stream. Not supported by modulemd-packager-v3. --> <license> <module></module> <content></content> </license> <references comunity="" documentation="" tracker=""/> <profile name="" description=""> <package></package> </profile> <api></api> <demodularized></demodularized> <nevra name="" epoch="" version="" release="" arch=""/> </build> <default-profile modified=""> <!-- @modified could be renamed to version --> <profile></profile> <!-- With a value replaces, missing unsets. --> </default-profile> <obsolete modified="" context=""> <!-- @modified in seconds since the epoch. Missing or empty @context means all contexts. --> <eol when="" message=""> <!-- Missing element means unsetting. --> <!-- @when in seconds since the epoch, missing means now. --> <replacement module="" stream=""/> </eol> </obsolete> <translation modified=""> <!-- @modified could be renamed to version --> <locale name=""> <!-- Each of the child is optional, but there must be at least one. --> <build summary="" description=""/> <!-- missing @summary, @description unsets --> <profile name="" description=""/> <!-- missing @description unsets --> <obsolete context="" message=""/> <!-- missing or empty @context means all contexts, missing @message unsets, unsupported in YAML. --> </locale> </translation> </stream> <default-stream modified="" stream=""/> <!-- @modified could be renamed to version --> <!-- Existing @stream sets a default, missing or empty unsets. --> </module> </index> As you can see, there are no separate documents for modules and default streams. Everything is kept inside one document. That enables properties (e.g. obsoletes or default profiles) pertaining the same entity (e.g. a stream) to be placed together. That prevents from repeating the identifiers (e.g. stream names) and makes the format more succinct and easier for querying. That's especially import for DNF which needs quickly to know list of modules, streams of modules, to find out the latest build etc. An example.xml file shows how a real data would look <https://github.com/fedora-modularity/libmodulemd/blob/main/xml_specs/reduced/example.xml>. You can see e.g. see that time stamps are encoded as a number of seconds since a Unix epoch. That will save DNF from parsing e-mail date notations, handling time zones etc. There is also a formal specification in a form or XML Schema <https://github.com/fedora-modularity/libmodulemd/blob/main/xml_specs/reduced/schema.xsd>. And tests subdirectory with a preliminary sets of good and bad examples that validates and fails a validation. I'd be glad to hear any comments on the format. A grand plan how to implement and deploy this format is outlined in top-level README.md <https://github.com/fedora-modularity/libmodulemd/blob/main/xml_specs/README.md>. Basically it will be injected into createrepo_c tool to produce the XML data in YUM repositories. Then the format will be consumed by DNF5. (Just to clarify, currently missing support for modules in DNF5 is not caused by this new XML format. DNF5 will support modules in the old YAML format soon through libmodulemd library.) According to my consultation with DNF team, DNF5 plans to prefer the XML format if both XML and YAML would exist in a repository. -- Petr
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue