Re: RHEL 9 and modularity

clime <clime@xxxxxxxxxxxxxxxxx> · Tue, 23 Jun 2020 20:31:06 +0200

On Mon, 22 Jun 2020 at 08:14, Zbigniew Jędrzejewski-Szmek
<zbyszek@xxxxxxxxx> wrote:
>
> On Mon, Jun 22, 2020 at 04:55:10AM +0200, clime wrote:
> > >> > > Hello Josh,
> > >> > >
> > >> > > you can change the artifact type while keeping interface the same and
> > >> > > it would be a _HUGE_ win because it would make modularity finally
> > >> > > understandable for mere humans and better maintainable.
> > >> > >
> > >> > > Namely, modules should become rpms and therefore obey standard rpm rules.
> > >> >
> > >> > I'm not sure I entirely understand what you mean, but it sounds like
> > >> > you have some interesting ideas.
> > >> >
> > >> > I'm looking forward to seeing what you and the community can build
> > >> > from them, and how they could be brought into RHEL 10+!  That kind of
> > >> > collaboration is what makes Fedora great.
> > >>
> > >> I know this probably won't change anything because this was mentioned
> > >> many times (by me at least) and nothing has changed but still...
> > >>
> > >> Currently, modules are essentially yum sub-repos, they are not really
> > >> "modules", instead they are collections of rpms that reinvent rpm-like
> > >> relations (obsoletes, requires, build-requires, etc.).
> > >>
> > >> There is no reason for this wheel-reintervention. Modules (the
> > >> collections) can be simply squashed into an rpm by automation and this
> > >> resulting rpm can go to the modular repo together with other modules.
>
> I agree with this general idea, even if not with the exact implementation
> (comments below). In the past this was stated as "divorcing the build ordering
> mechanism from the rpm delivery mechanism". The fact that we have two layers
> of dependencies make Modularity conceptually hard and destroy the interaction
> with the dependency solver. Also, if we disconnect the build and delivery
> mechanisms, we can iterate and improve both separately.
>
> > >> That way we don't have two types of objects we complex inter-relations
> > >> but only one we well-known behavior.
> > >>
> > >> I wonder if this is clear to everyone but nobody really cares or
> > >> doesn't really want to say it or I don't know.
> > >>
> > >> Is this clear to everyone? I mean either I am stating an obvious stuff
> > >> that nobody really considers worth typing or idk.
> > >
> > >
> > > How would this work when there are optional rpms in the module?
> > >
> > > You do not need to install every rpm in  eg the php module (different graphics/database backends) for that module to be useful, but every version of the module will have the rpm as an option which wont work outside a module of multiple rpms.
> >
> > Glad you ask, I wasn't precise...
> >
> > Well, I didn't mean everything always needs to be squashed, instead,
> > it would be an optional step in modulemd processing.
>
> So... if it's only optional, that means that the general case where
> squashing is not done needs to be solved anyway. And once you have
> solved the general case, what would the point of squashing be?
> Thus, I don't find squashing useful.
>
> > For some
> > use-cases (like delicately compiled postgresql server), you can create
> > a single rpm that contains all - postgresql-server, postgresql,
> > postgresql-libs compiled in a specific way, optionally with some
> > postgresql modules pre-included, so it would be let's say time-series
> > optimized postgresql. Here it makes sense to make a single rpm from it
> > - you install that and you are all set up for your use-case.
> >
> > Then there are language stacks where you might want to build things in
> > a specific order - there nothing really needs to be squashed (or
> > certain subset can if it makes sense) but you can still use modularity
> > to easily batch-build certain rpms. If there are runtime optional
> > deps, they can be described by Recommends/Suggests.
> >
> > Basically, once a "module" (things that comes from modulemd) is built,
> > it should be put into normal repos and the "module" boundary should be
> > forgotten (unless it is a single rpm), i.e. "module" is a built-time
> > thing, at install-time we just have standard packages with standard
> > deps.
>
> Yep.
>
> The unanswered question is what mechanism would be used make sure that
> the rpms from the "module" are all installed. One option would be to
> somehow mangle rpm names, another option would be to add some kind of
> Provides/Requires, etc. But *some* mechanism is needed, because without
> that dnf would often pick other rpms.
>
> In Modularity the solution is that the rpms from the module shadow
> rpms with the same name from outside. That's probably the single
> feature of Modularity that causes the most problems.

Yeah, I could notice modularity causes quite a lot of troubles.

The question is whether those troubles can be somehow technically
justified, i.e. by the benefit modularity will eventually bring.

But I personally don't see it. Actually, (big revelation) I have never
understood modularity and I was even there in RH when it was just
starting to be born. Probably, one of the reasons why I didn't
understand it at that point was that there seemed to be no clear
specification floating around. We knew modularity should solve "too
fast, too slow" problem of distributions but it wasn't exactly clear
how. It was a cool buzz word but nobody seemed to know what it means
(at least that was my view of the situation).

I and perhaps others were thinking that it wants to provide parallel
availability+installability of the distribution software but after
some time, it was cleared out this isn't the case and that the goal is
just "parallel availability".

OK, but even this isn't clear to me.

I mean I understand the usefulness of modularity for build-time where
you can create a recipe (modulemd) that will build your packages with
interdependencies in a predictable and automatic manner. I think
that's a cool thing to have.

But what about run-time (or install-time in other words)? That's the
part I don't understand. And I have spent quite some time trying to
understand it but never managed. So it's possible that I am missing
something all these years...in that case, it would be great if
somebody could shed a light on it for me. Here, I would really like to
give modularity a chance.

>From what I understand, the use of modularity in run-time is to
provide rpm namespaces. Natural way to do this would be to use
separate repositories ala COPR where rpms are namespaced by repo ID
but I know one of the requirements of modularity was to use a single
repo for those namespaces with an argument that dnf is slow when
working with a large number of repos...to me that reason always seemed
quite artificial...something is slow...ok, then it can be made faster.
I could understand if we were talking about let's say thousands of
modules - there I would believe that initiating a thousand (or
multiplied by few) new downloads of repo files might already have its
price. But okay, if thousands of modules were the plan, then I could
understand this argument.

But now comes even more curious part. So...run-time modularity
provides rpm namespacing if I understand it correctly. Basically
<module>/<stream>/<package_name>. The easy solution for this would be
to put the namespace implicitly into package name like python does it
when there should be multiple pythons available, e.g. currently in
CentOS7/EPEL7, there is python34-requests and python36-requests (I
understand there will be a dot between major and minor at some point
so e.g. python3.6-requests but that's another thing :)). So if we have
different rpm names (because the namespace is already included in the
rpm name itself), then there is no problem to provide multiple
variants of the "same package" (the same thing but intended e.g. for a
different python interpreter) in the same repo.

So I would be willing to accept that this is a hacky solution or just
a workaround (even though I am not sure it is). But even if I accept
that it is just a hack and we need a more proper solution, I still
have an issue in my mind. Let's say we have this two-level namespacing
(<module>/<stream>/) and it enables us to have a package of the same
name twice or more times in the same repo and it enables us to avoid
mangling the rpm names. Great, isn't it? Well...but what if those
different variants of the same package are actually
parallel-installable and a user would benefit from having them
parallel-installable (because it's a dev working with different
versions of the same language at the same time)? We can only install a
package of a certain name once into the system so that's why
modularity enables us to use always just a single stream from all
available streams of a module, i.e. you can only switch between the
individual streams, having multiple of them enabled at the same time
is not possible.

So basically, modularity gives parallel-availability but at the same
time, it disables the option of parallel-installability which could be
achieved through alternatives and some smart packaging for probably
all the language stacks if I understand correctly. I think that's a
too much of a limitation. To avoid it, we would need to keep an rpm DB
per the namespace (<module>/<stream>/) and these various DBs would be
handled by dnf, which would basically mean, rpm command itself
wouldn't know what's all installed on the system - hard to imagine
that people would be alright with it. ...Or we can bring the notion of
the namespaces into rpm itself (that's where my suggestion of "Stream"
rpm attribute comes from but it could also be called just
"Namespace"). But then there is the argument: "Why not just put the
namespace into rpm name itself?" I mean...I wouldn't mind having it as
a separate attribute but the usefulness of it would need to be
discussed.

So I don't really get even after almost five years where modularity is
going or what it wants to achieve. I don't understand its use-case for
any of Fedora, RHEL, and CentOS because disabling
parallel-installability to allow parallel availability is imho not
really an option. But yeah...maybe I am missing some angle. In that
case, please, explain it to me because I would really like to
understand...

clime

>
> > dnf interface could be kept given that we "Stream" rpm property is
> > added. This is still a bit rough what I am saying but hopefully it
> > makes at least a bit of sense...
>
> Zbyszek
> _______________________________________________
> devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx