Re: Making IGT runnable by CI and developers

Daniel Vetter <daniel@xxxxxxxx> · Mon, 24 Jul 2017 11:27:31 +0200

On Mon, Jul 24, 2017 at 09:15:28AM +0100, Tvrtko Ursulin wrote:
> 
> On 21/07/2017 16:45, Daniel Vetter wrote:
> > On Fri, Jul 21, 2017 at 12:56 PM, Tvrtko Ursulin
> > <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
> > > 
> > > On 20/07/2017 17:23, Martin Peres wrote:
> > > > 
> > > > Hi everyone,
> > > > 
> > > > As some of you may already know, we have made great strides in making our
> > > > CI system usable, especially in the last 6 months when everything started
> > > > clicking together.
> > > > 
> > > > The CI team is no longer overwhelmed with fires and bug reports, so we
> > > > started working on increasing the coverage from just fast-feedback, to a
> > > > bigger set of IGT tests.
> > > > 
> > > > As some of you may know, running IGT has been a challenge that few manage
> > > > to overcome. Not only is the execution time counted in machine months, but
> > > > it can also lead to disk corruption, which does not encourage developers to
> > > > run it either. One test takes 21 days, on its own, and it is a subset of
> > > > another test which we never ran for obvious reasons.
> > > > 
> > > > I would thus like to get the CI team and developers to work together to
> > > > decrease sharply the execution time of IGT, and get these tests run multiple
> > > > times per day!
> > > > 
> > > > There are three usages that the CI team envision (up for debate):
> > > >    - Basic acceptance testing: Meant for developers and CI to check quickly
> > > > if a patch series is not completely breaking the world (< 10 minutes,
> > > > timeout per test of 30s)
> > > >    - Full run: Meant to be ran overnight by developers and users (< 6
> > > > hours)
> > > 
> > > 
> > > We could start by splitting this budget to logical components/teams.
> > > 
> > > So far we have been talking about GEM and KMS, but I was just thinking that
> > > we may want to have a separate units on this level of likes of power
> > > management, DRM (core), external stuff like sw fences? TBD I guess.
> > > 
> > > Assuming GEM/KMS split only, fair thing seems to be split the time budget
> > > 50-50 and let the respective teams start working.
> > 
> > Yes, KMS is also not perfect, but there it's maybe a factor of 2x that
> > it's taking too long. GEM is 50x or worse. Also note KMS includes
> > everything, so core drm, PM tests. 2x is something can be fixed as we
> > go, which is good, since it means we should be able to pre-merge test
> > any changes to igt before pushing. GEM is not even close.
> > 
> > > I assume this is x hours on the slowest machine?
> > > 
> > > Teams would also need easy access to up-to-date test run times.
> > 
> > Right now you can't have that for GEM, because it takes 24d. That
> > means 1 run of GEM takes away 50 runs of everything else (need to
> > check, it might be worse). There's simply no way we can even hand out
> > that data without blocking pre-merge CI for everyone else.
> > 
> > We might be able to schedule the occasional manual run over the w/e,
> > but that's about it.
> 
> I did not explain well here what I was thinking about by access to
> up-to-date runtime. I assumed we would start from a cut down list, the one
> which fits in the time budget.
> 
> As Martin and me chatted on Friday, I would be completely fine with the CI
> team just picking a list of GEM tests which fits, and then the GEM team
> responsibility is to add, remove and improve tests until this time is used
> in the most optimal way.
> 
> This was we would be getting daily test run time updates.
> 
> We also talked about the idea to set up an IGT trybot, where we could send
> test changes, followed by a testlist updates, and so see the specific test
> runtimes across the platforms.
> 
> Once that looks ok, we could submit a patch to the real test list and so
> keep iterating until the above goal is reached.

Atm we have 99% of GEM stuff that we simply cannot run. I dont think it's
a good idea to carry that around forever (simply because enumerating all
these tests alone kills machine time if you try to run stuff locally). Is
the plan to not clean that up?

2nd issue I have with an explicit gem test suite: New testcases won't get
tested by default, which means no pressure on them to be fast or useful or
stable. That's imo a big reason for why we ended up here. So if you think
an explicit gem test list is the way to go, then I think the only way to
do that is with a blacklist (which would start out with all gem tests).

And after a few months we'd just go through the sources and delete all the
tests still blacklisted, or something like that.
-Daniel 
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx