Re: [RFC] Optionally using git repositories instead of the lookaside cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 9 Apr 2020 at 01:28, Neal Gompa <ngompa13@xxxxxxxxx> wrote:
>
> On Wed, Apr 8, 2020 at 7:17 PM clime <clime@xxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, 9 Apr 2020 at 01:08, Neal Gompa <ngompa13@xxxxxxxxx> wrote:
> > >
> > > On Wed, Apr 8, 2020 at 6:55 PM clime <clime@xxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Thu, 9 Apr 2020 at 00:48, Neal Gompa <ngompa13@xxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Apr 8, 2020 at 4:27 PM Jeremy Cline <jeremy@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > The Fedora kernel is moving to maintaining the package in a source
> > > > > > (sometimes people refer to it as an "exploded") tree. Basically just a
> > > > > > fork of upstream. This makes a lot of packager tasks easier, but has
> > > > > > introduced a minor issue with respect to the lookaside cache.
> > > > > >
> > > > > > Right now, it's configured to create a tarball from the git tree and
> > > > > > upload it to the lookaside cache for each build. We build the rawhide
> > > > > > kernel every weekday (give or take) and the xz compressed source
> > > > > > tarball is ~110MB. This works out to about 28GB per year for Rawhide
> > > > > > alone (if this is a drop in the bucket and no one cares please let me
> > > > > > know and we'll just do this). The old approach uploaded a release
> > > > > > tarball and then incremental tarballs on top of that.
> > > > > >
> > > > > > If, however, Fedora allowed packagers to optionally generate tarballs
> > > > > > from a git repository we could just push the linux git repository. The
> > > > > > entire repository with history going back 15 years is under 4GB total,
> > > > > > which is pretty good when compared to ~419GB which is the space
> > > > > > required for the equivalent time using the lookaside cache.
> > > > > >
> > > > > > What would need to change:
> > > > > >
> > > > > > * Fedora offers a git repository to push source trees to.
> > > > > >
> > > > > > * A new file in the dist-git repository could be added if the packager
> > > > > >   wishes called "source-repos". In it, it contains a git url and commit
> > > > > >   identifier. For example, an entry might look like:
> > > > > >     "
> > > > > > https://src.fedoraproject.org/sources/kernel.git v5.6"
> > > > > >   where v5.6 is a tag in the repository. We can restrict it so the git
> > > > > >   repository must be hosted by Fedora so we keep all the sources
> > > > > >   forever.
> > > > > >
> > > > > > * fedpkg and fedpkg-minimal would need to be updated to pull the
> > > > > >   source tree if the "source-repos" file is found and run
> > > > > >   "git archive". Fortunately this work is actually already done since
> > > > > >   Red Hat's version of fedpkg already supports this.
> > > > > >
> > > > > > I'm happy do to all the work for fedpkg/fedpkg-minimal to make this
> > > > > > possible because the other option is to add a bunch of hacks to the
> > > > > > kernel tooling to spit out a bunch of incremental tarballs to reduce
> > > > > > what we have to upload.
> > > > > >
> > > > > > I assume this is something that will need to go through the packaging
> > > > > > SIG, but from an infra side of things are there any thoughts/concerns?
> > > > > >
> > > > >
> > > > > At least with this _specific_ proposal, I don't see too many issues.
> > > > > Adding a "sources" namespace to Pagure and setting up a workflow for
> > > > > that isn't a horrible idea.
> > > > >
> > > > > I still feel like my general concerns in original proposal from two
> > > > > years ago[1] haven't been sufficiently addressed. But, given that you
> > > > > seem to have a specific idea in mind here, my questions about this for
> > > > > the kernel (and others that would opt into this workflow):
> > > > >
> > > > > * Are you okay with imposing the same restrictions we have on rpms/*,
> > > > > modules/*, flatpaks/*, and containers/* for sources/*? That is, no
> > > > > rewriting history, no branch deletion, no tag deletion, etc.
> > > > > * Are you okay with blocking the usage of submodules, Git LFS,
> > > > > Git-Annex, or any other mechanism that allows bypassing our
> > > > > protections or cannot be replicated from an upstream repo locally?
> > > >
> > > > I would just like to note that this point is not precise. Usage of git
> > > > submodules (and other technologies) is completely alright if they
> > > > still point to src.fp.o. Is there a source for the point so that I can
> > > > open a PR to fix it?
> > > >
> > >
> > > Making foreign repositories do that isn't straightforward. You would
> > > have to edit the repositories and change all the submodules, download
> > > and reimport all the LFS/Annex objects, etc. And that all tampers with
> > > the repository itself in ways that break the concept of having
> > > pristine trees mirrored to build from.
> >
> > Sorry, I don't understand:
> >
> > you make a git submodule for src.fp.o repo which points to another
> > src.fp.o repo.
> >
> > when you push, there is a hook in src.fp.o that checks if there is any
> > submodule and checks that it has the same origin as the repo the
> > submodule is in (i.e. src.fp.o).
> >
> > and then during build you can clone with `--recurse-submodules`.
> >
> > I don't really understand what you meant with "foreign repositories",
> > downloading LFS/Annex objects etc.
> >
>
> If you're doing mirrors and building from mirrored Git repos (as
> essentially what Jeremy is talking about), what you're suggesting is
> simply not possible or scalable.

Sorry, I still don't get your point.

Jeremy's solution was about introducing new namespace on src.fp.o
where mirrored upstream repo will be.

Then it was about introducing the file "source-repos" that contains a
git url and commit identifier and points to that mirrored repository.

This file is essentially trying to emulate the same functionality
which is already included in git - git submodules.

So I am suggesting to use git submodules instead of inventing a new
custom solution to solve the same problem.

Then I was also talking about how it is possible to archive those
submodules by using a dedicated macro (rpkg macro in that case).

Can you explain what is not possible or scalable in that?

>
>
>
>
> --
> 真実はいつも一つ!/ Always, there's only one truth!
> _______________________________________________
> infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx




[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux