Re: Time for drm-ci-next?

Rob Clark <robdclark@xxxxxxxxx> · Mon, 10 Mar 2025 07:44:58 -0700

On Mon, Mar 10, 2025 at 4:07 AM Maxime Ripard <mripard@xxxxxxxxxx> wrote:
>
> On Fri, Mar 07, 2025 at 09:26:54AM -0800, Rob Clark wrote:
> > On Fri, Mar 7, 2025 at 9:00 AM Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> > > On Fri, Mar 07, 2025 at 08:42:46AM -0800, Rob Clark wrote:
> > > > On Tue, Sep 24, 2024 at 5:27 AM Vignesh Raman
> > > > > On 12/09/24 11:18, Dmitry Baryshkov wrote:
> > > > > > On Mon, Sep 09, 2024 at 07:34:04AM GMT, Rob Clark wrote:
> > > > > >> On Mon, Sep 9, 2024 at 2:54 AM Dmitry Baryshkov
> > > > > >> <dmitry.baryshkov@xxxxxxxxxx> wrote:
> > > > > >>>
> > > > > >>> On Mon, 9 Sept 2024 at 10:50, Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> > > > > >>>>
> > > > > >>>> Hi,
> > > > > >>>>
> > > > > >>>> On Tue, Jul 09, 2024 at 01:27:51AM GMT, Dmitry Baryshkov wrote:
> > > > > >>>>> On Mon, 8 Jul 2024 at 21:38, Rob Clark <robdclark@xxxxxxxxx> wrote:
> > > > > >>>>>>
> > > > > >>>>>> On Mon, Jul 8, 2024 at 1:52 AM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>> On Fri, Jul 05, 2024 at 12:31:36PM -0700, Rob Clark wrote:
> > > > > >>>>>>>> On Fri, Jul 5, 2024 at 3:36 AM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Thu, Jul 04, 2024 at 08:40:26AM -0700, Rob Clark wrote:
> > > > > >>>>>>>>>> On Thu, Jul 4, 2024 at 7:08 AM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Tue, Jul 02, 2024 at 05:32:39AM -0700, Rob Clark wrote:
> > > > > >>>>>>>>>>>> On Fri, Jun 28, 2024 at 10:44 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On Thu, Jun 27, 2024 at 11:51:37AM -0700, Rob Clark wrote:
> > > > > >>>>>>>>>>>>>> On Wed, Jun 26, 2024 at 10:47 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> On Wed, Jun 26, 2024 at 11:38:30AM +0300, Dmitry Baryshkov wrote:
> > > > > >>>>>>>>>>>>>>>> On Wed, Jun 26, 2024 at 09:32:44AM GMT, Daniel Vetter wrote:
> > > > > >>>>>>>>>>>>>>>>> On Mon, Jun 24, 2024 at 10:25:25AM -0300, Helen Koike wrote:
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> On 24/06/2024 02:34, Vignesh Raman wrote:
> > > > > >>>>>>>>>>>>>>>>>>> Hi,
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> On 15/03/24 22:50, Rob Clark wrote:
> > > > > >>>>>>>>>>>>>>>>>>>> Basically, I often find myself needing to merge CI patches on top of
> > > > > >>>>>>>>>>>>>>>>>>>> msm-next in order to run CI, and then after a clean CI run, reset HEAD
> > > > > >>>>>>>>>>>>>>>>>>>> back before the merge and force-push.  Which isn't really how things
> > > > > >>>>>>>>>>>>>>>>>>>> should work.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> This sounds more like you want an integration tree like drm-tip. Get msm
> > > > > >>>>>>>>>>>>>>>>> branches integrated there, done. Backmerges just for integration testing
> > > > > >>>>>>>>>>>>>>>>> are not a good idea indeed.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> But AFAIU this doesn't help for pre-merge testing, ie. prior to a
> > > > > >>>>>>>>>>>>>> patch landing in msm-next
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> My idea was to have a drm-ci-next managed similar to drm-misc-next, if
> > > > > >>>>>>>>>>>>>> we have needed drm/ci patches we could push them to drm-ci-next, and
> > > > > >>>>>>>>>>>>>> then merge that into the driver tree (along with a PR from drm-ci-next
> > > > > >>>>>>>>>>>>>> to Dave).
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> I guess I'm confused about what kind of pre-merge testing we're talking
> > > > > >>>>>>>>>>>>> about then ... Or maybe this just doesn't work too well with the linux
> > > > > >>>>>>>>>>>>> kernel. The model is that you have some pile of trees, they're split up,
> > > > > >>>>>>>>>>>>> and testing of all the trees together is done in integration trees like
> > > > > >>>>>>>>>>>>> linux-next or drm-tip.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> pre-merge: for msm we've been collecting up patches from list into a
> > > > > >>>>>>>>>>>> fast-forward MR which triggers CI before merging to msm-next/msm-fixes
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Ideally drm-misc and other trees would do similar, we'd catch more
> > > > > >>>>>>>>>>>> regressions that way.  For example, in msm-next the nodebugfs build is
> > > > > >>>>>>>>>>>> currently broken, because we merged drm-misc-next at a time when
> > > > > >>>>>>>>>>>> komeda was broken:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> https://gitlab.freedesktop.org/drm/msm/-/jobs/60575681#L9520
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> If drm-misc was using pre-merge CI this would have been caught and the
> > > > > >>>>>>>>>>>> offending patch dropped.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> That sounds more like we should push on the drm-misc pre-merge CI boulder
> > > > > >>>>>>>>>>> to move it uphill, than add even more trees to make it even harder to get
> > > > > >>>>>>>>>>> there long term ...
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Short term it helps locally to have finer trees, but only short term and
> > > > > >>>>>>>>>>> only very locally.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> The path to have fewer trees (ideally only one for all of drm) is to
> > > > > >>>>>>>>>> use gitlab MRs to land everything :-)
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> drm-ci-next is only a stop-gap.. but one that we need.  The
> > > > > >>>>>>>>>> ${branchname}-external-fixes trick covers _most_ cases where we need
> > > > > >>>>>>>>>> unrelated patches (ie. to fix random ToT breakage outside of drm or in
> > > > > >>>>>>>>>> core drm), but it doesn't help when the needed changes are yml
> > > > > >>>>>>>>>> (because gitlab processes all the yml before merging the
> > > > > >>>>>>>>>> -external-fixes branch).  This is where we need drm-ci-next, otherwise
> > > > > >>>>>>>>>> we are having to create a separate MR which cherry-picks drm/ci
> > > > > >>>>>>>>>> patches for doing the CI.  This is a rather broken process.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> So what I don't get is ... if we CI drm-misc, how does that not help
> > > > > >>>>>>>>> improve the situation here? Step one would be post-merge (i.e. just enable
> > > > > >>>>>>>>> CI in the repo), then get MRs going.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I guess post-merge is better than nothing.. but pre-merge is better.
> > > > > >>>>>>>>
> > > > > >>>>>>>> post-merge can work if you have a "sheriff" system where someone
> > > > > >>>>>>>> (perhaps on a rotation) is actively monitoring results and "revert and
> > > > > >>>>>>>> ask questions later" when something breaks.  Pre-merge ensures the
> > > > > >>>>>>>> interested party is involved in the process ;-)
> > > > > >>>>>>>
> > > > > >>>>>>> So ... make that happen? And it doesn't have to be for all of drm-misc,
> > > > > >>>>>>> mesa after all switched over to MR also on a bit a driver/area basis. So
> > > > > >>>>>>> agreeing among all drm-ci folks to use gitlab MR in drm-misc for pre-merge
> > > > > >>>>>>> testing shouldn't be that hard to make happen. And unlike a separate
> > > > > >>>>>>> branch it's not some kind of detour with a good chance to get stuck in a
> > > > > >>>>>>> local optimum.
> > > > > >>>>>>
> > > > > >>>>>> Tree vs branch doesn't really have much in the way of distinction,
> > > > > >>>>>> modulo gitlab permissions.  In that it doesn't do much good if drm/ci
> > > > > >>>>>> patches are landing on a different branch.
> > > > > >>>>>>
> > > > > >>>>>> I guess what you are suggesting is that we have a single tree/branch
> > > > > >>>>>> that drm/ci + drm/msm + (plus whoever else wants to get in on the
> > > > > >>>>>> drm/ci, so probably at least vkms) lands patches into via gitlab MRs?
> > > > > >>>>>
> > > > > >>>>> This again brings a separate CI-enabled tree. I think it has been
> > > > > >>>>> suggested to start with enabling DRM CI for drm-misc branches. Then we
> > > > > >>>>> can possibly start landing MRs with CI testing (probably starting with
> > > > > >>>>> vkms).
> > > > > >>>>
> > > > > >>>> It's something we've discussed with Sima for a while, but enabling CI on
> > > > > >>>> drm-misc at some point will make sense so we could just as well start
> > > > > >>>> now.
> > > > > >>>>
> > > > > >>>> The biggest unknown at the moment to start doing so is how we could keep
> > > > > >>>> drm-tip and the rerere repo with MR.
> > > > > >>>
> > > > > >>> Let's do a slow start and begin with post-merge testing. At least this
> > > > > >>> gives us an idea of how stable it is (and what does it take to keep it
> > > > > >>> green). Maybe we can perform post-merge testing for both drm-misc and
> > > > > >>> drm-tip.
> > > > > >>
> > > > > >> The one thing is that currently the runtime for igt is quite long
> > > > > >> (~1hr) because you can't really parallelize kms tests.  So maybe
> > > > > >> nightly scheduled runs would be a better idea.
> > > > > >
> > > > > > SGTM. So, the question would be, who can set it up?
> > > > > >
> > > > >
> > > > > We will test the nightly pipelines in a forked repo and then will
> > > > > set it up for drm-misc and drm-tip.
> > > > >
> > > >
> > > > Revisiting this old thread...
> > > >
> > > > It's becoming increasingly clear that landing drm/ci changes via drm-misc
> > > > (where gitlab CI is not used) isn't working out.
> > >
> > > Why?
> >
> > Some of it has been breakage in drm/ci, which would have been noticed
> > if drm/ci went thru CI.
> >
> > And some of it is that expectation changes in other drivers when we
> > backmerge drm-next/drm-misc-next aren't cared for.  So we end up not
> > enforcing a green-pipeline, and just ignoring failed jobs on other
> > drivers.  I guess we could just make expectation updates for other
> > drivers as part of drm/msm MRs, but that feels a bit strange.
>
> Yeah, ok, I can see how that can happen and how it can be frustrating :)
>
> > Related question, should we just smash expectation updates into the
> > merge commit?  Or keep it a separate commit?  In the latter case, I
> > think the expectation update commit should not need a r-b on list,
> > since it is just updating expectations to the reality of changes
> > merged elsewhere without a CI run.
>
> My gut feeling is that we should review that change still, because
> people will mess the expectations up just to get that green light. But
> both suck, really, so I'm not really sure.

Either way I'd want to notify maintainers of the respective drivers.
If we need to add a new expected fail, due to a backmerge, the change
that caused the failure has already been merged.  The expectations
update is just to reflect reality, so not really much to review.

And I should mention, at this point I'm not proposing opening up MRs
to _everyone_, just to driver maintainers of whichever drivers want
to participate.

> > > Also, tbf, drm-misc doesn't use gitlab CI because nobody solved the
> > > drm-tip issue. And because the mail I sent 6 months ago about didn't get
> > > any feedback.
> >
> > If gitlab MRs were used to land changes in drm-misc, this would solve
> > the problem.  I'd be happy to start landing drm/msm changes through
> > that path.
> >
> > But I don't think there is any good way to mix/match gitlab MRs with
> > other processes (dim, or raw git) in the same tree.  We need to be
> > able to enforce a green CI run to land changes.
>
> So, I agree, and it's been the intent to make drm-misc this since even
> before we moved to Gitlab.
>
> But there's different things to consider:
>
> - We frequently enough merge patches affecting other subsystems that
>   will need to be notified, and will need to review our work. Just
>   saying that a random maintainer that gets 2 MR each year from us
>   should watch our Gitlab frequently doesn't work. So, how do we solve
>   that? Do we take it to the ML, and once we get their AB, we merge it
>   through a MR?
>
> - The kernel kind of mandates of maintainers / committers to put their
>   Signed-off-by, how do we automate that in Gitlab?
>
> - Similarly, Linus *really* likes signed PRs, how do we make one
>   switching to Gitlab?
>
> - Do we require people to still use dim, or do we just drop dim?
>
> - Do we have enough runners to begin with?
>
> - And the final one that really stinks: how do we make sure we send the
>   right branches to linux-next?
>
> So far, nobody really decided or solved those things. And we would have
> to solve this (except for dim) for the tree you're suggesting, because
> most of it really is about dealing with the impedance mismatch with the
> rest of the kernel.

What we've been doing with drm/msm (since v5.18) is pretty transparent
to patch submitters and PR consumers:

1) Patches sent to list and reviewed on list

2) Driver maintainers (Dmitry, Abhinav, and myself) take turns
   collecting up reviewed patches from list into an MR

3) The gitlab tree is set to only allow ff merges, which sidesteps
   the signoff/etc issues.  It would be nice if gitlab (or margebot?
   Or something?) could apply the s-o-b's, etc.  We'd need that if
   we wanted to open things up for submitters to send MRs directly.

4) Gitlab CI runs on the MR..  so in addition to ad-hoc local testing,
   we have on-device igt runs, a variety of build tests, etc.  (I'd
   like to get to having deqp runs with a known good mesa, but we
   aren't there yet.)

   This part is still a bit messy because we frequently have to pull
   in drm/ci fixes, etc.  Which is what I'd like to fix by having a
   drm-gitlab tree which is used for drm/ci and drm/msm patches (and
   open for other drivers to join in).

   We do have an ${target_branch}-external fixes mechanism which we
   can use for required fixes that would land via some other tree,
   in case (for ex) one of the CI boards doesn't boot properly without
   some fix in another subsystem.  But this has the limitation that
   it can't deal with yaml changes, since the branch is merged in the
   build job, after gitlab has already slurped in all the .yml

5) PRs sent to Dave and Sima are tagged and sent as normal.  It isn't
   automated (yet), but the PR notes are collected up from however
   many MRs were merged.  (Typically one or two for display side
   changes from Dmitry/Abhinav and maybe one or two for GEM/GPU/etc
   from myself.)

You can sum all that up as just inserting an MR step into the process.
Ie. rather than push directly to msm-next, I push to msm-next-robclark.
Which then gets merged thru gitlab after a CI run.

It might be possible to implement a similar thing with dim, ie. the
drm-misc maintainers push to drm-misc-next-staging which gets merged
into drm-misc-next after an MR pipeline?

> We talked about it some time ago with Dave, and we concluded that the
> best course of action was to create a few actions for DRM to run when we
> send PR (like a build test and kunit run), and then expand and figure
> things out as we use it more and more.

Build test and kunit can accomplish _some_ things.. but kunit in
particular can only test the hw you have.  But gitlab CI can leverage
the CI farm we already have in place for mesa, and run on-device CI
across a range of DUTs.  I don't see how you can replicate that any
other way.

(Not to mention the side benefits of having a record of test runs and
artifacts.)

BR,
-R

> And that would mean doing a transition phase.
>
> > > > On the drm/msm side, we pretty regularly end up needing a 2nd dummy MR
> > > > with additional drm/ci, etc patches on top for running our CI
> > > > pipelines, which is really _not_ the way this is supposed to work.
> > > >
> > > > So I think we want a single tree to merge drm/ci, drm/msm, and changes for
> > > > any other driver that wants to participate in the CI process.  We could
> > > > call it drm-gitlab or drm-ci or whatever.  The rules would be:
> > > >
> > > > * _Only_ land changes via MR with passing CI pipeline.  We should configure
> > > >   the gitlab tree to disallow MRs without a green pipeline.
> > > >
> > > > * All drm/ci changes go thru this tree.
> > > >
> > > > * When we need to backmerge drm-next/files or drm-misc-next/fixes, that
> > > >   goes via an MR into this shared tree, just like any other change.
> > > >
> > > >   If there are expectation updates (tests start to fail or pass, we make
> > > >   the fails/flakes/skips changes on the same MR, no questions asked.
> > > >   (But would be polite to tag the associated driver maintainer on the
> > > >   MR for visibility.)
> > > >
> > > > * Once we've done this, we could conceivable use similar file-path rules
> > > >   like mesa does to only run applicable jobs.  Ie. if the MR only touches
> > > >   drm/msm there is no need to run i915 CI jobs.  So we could optimize
> > > >   the CI runner utilization this way.
> > > >
> > > > * Only ff-merges.  At least to start with it would be only driver
> > > >   maintainers submitting MRs with patches collected up from list, so
> > > >   it would be few enough people that this shouldn't be a problem to
> > > >   coordinate.
> > > >
> > > > * I'd be open to the idea of allowing drm core and cross-driver changes
> > > >   thru this tree.  These are especially the sorts of things that we'd
> > > >   like to see have a clean CI pipeline.  But not sure how this would
> > > >   conflict with drm-misc.  One possible future is that it replaces
> > > >   drm-misc eventually.
> > >
> > > And we get back to the drm-tip discussion here. If you want to do any
> > > meaningful cross-driver or core changes, you need to have an integration
> > > tree like drm-tip. And if we solve that, afaict, we solve the only
> > > obstacle to using gitlab ci in drm-misc so we might just take drm/msm
> > > in instead.
> >
> > I definitely agree about the usefulness of an integration tree.  And
> > an integration tree is where CI is especially valuable.  So I'm open
> > to ideas to have gitlab CI integrated in this tree.  But I don't see a
> > good alternative to requiring a green CI pipeline for merging changes
> > into this tree, which means using gitlab MRs.  Maybe I'm not thinking
> > creatively enough here, but landing changes without a green CI run
> > just means discovering unrelated breakages when the next batch of
> > changes comes in via an MR.
>
> To a large extent, that pain is here because msm is an early adopter.
> It's up to you to decide whether the trade-off is worthy, but forcing it
> down to everybody without address what prevents everybody else from
> switching doesn't really work.
>
> It worked for you, with a smaller set of constraints, but we just can't
> get rid of them.
>
> Maxime