On Wed, 30 Aug 2023 at 17:14, Helen Koike <helen.koike@xxxxxxxxxxxxx> wrote: > > > > On 30/08/2023 11:57, Maxime Ripard wrote: > > On Wed, Aug 30, 2023 at 10:24:49AM -0300, Helen Koike wrote: > >> Hi all, > >> > >> Thanks for you comments. > >> > >> On 30/08/2023 08:37, Maxime Ripard wrote: > >>> On Wed, Aug 30, 2023 at 01:58:31PM +0300, Jani Nikula wrote: > >>>> On Wed, 30 Aug 2023, Maxime Ripard <mripard@xxxxxxxxxx> wrote: > >>>>> On Tue, Aug 22, 2023 at 04:26:06PM +0200, Daniel Vetter wrote: > >>>>>> On Fri, Aug 11, 2023 at 02:19:53PM -0300, Helen Koike wrote: > >>>>>>> From: Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx> > >>>>>>> > >>>>>>> Developers can easily execute several tests on different devices > >>>>>>> by just pushing their branch to their fork in a repository hosted > >>>>>>> on gitlab.freedesktop.org which has an infrastructure to run jobs > >>>>>>> in several runners and farms with different devices. > >>>>>>> > >>>>>>> There are also other automated tools that uprev dependencies, > >>>>>>> monitor the infra, and so on that are already used by the Mesa > >>>>>>> project, and we can reuse them too. > >>>>>>> > >>>>>>> Also, store expectations about what the DRM drivers are supposed > >>>>>>> to pass in the IGT test suite. By storing the test expectations > >>>>>>> along with the code, we can make sure both stay in sync with each > >>>>>>> other so we can know when a code change breaks those expectations. > >>>>>>> > >>>>>>> Also, include a configuration file that points to the out-of-tree > >>>>>>> CI scripts. > >>>>>>> > >>>>>>> This will allow all contributors to drm to reuse the infrastructure > >>>>>>> already in gitlab.freedesktop.org to test the driver on several > >>>>>>> generations of the hardware. > >>>>>>> > >>>>>>> Signed-off-by: Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx> > >>>>>>> Signed-off-by: Helen Koike <helen.koike@xxxxxxxxxxxxx> > >>>>>>> Acked-by: Daniel Stone <daniels@xxxxxxxxxxxxx> > >>>>>>> Acked-by: Rob Clark <robdclark@xxxxxxxxx> > >>>>>>> Tested-by: Rob Clark <robdclark@xxxxxxxxx> > >>>>>> > >>>>>> Ok I pushed this into a topic/drm-ci branch in drm.git and asked sfr to > >>>>>> include that branch in linux-next. > >>>>>> > >>>>>> But also I'd like to see a lot more acks here, we should be able to at > >>>>>> least pile up a bunch of (driver) maintainers from drm-misc in support of > >>>>>> this. Also maybe media, at least I've heard noises that they're maybe > >>>>>> interested too? Plus anyone else, the more the better. > >>>>> > >>>>> I'm not really convinced by that approach at all, and most of the issues > >>>>> I see are shown by the follow-up series here: > >>>> > >>>> I'm not fully convinced either, more like "let's see". In that narrow > >>>> sense, ack. I don't see harm in trying, if you're also open to backing > >>>> off in case it does not pan out. > >>>> > >>>>> https://lore.kernel.org/dri-devel/20230825122435.316272-1-vignesh.raman@xxxxxxxxxxxxx/ > >>>>> > >>>>> * We hardcode a CI farm setup into the kernel > >> > >> > >> These could be out of tree. > >> > >> There is a version outside the kernel tree where you just point the CI > >> configuration to a url: > >> https://gitlab.freedesktop.org/gfx-ci/drm-ci/-/merge_requests/1 > >> > >> We were discussing it here https://www.linuxtv.org/cgi-bin/mailman/private/linuxtv-ci/2023-August/000027.html > > > > It looks like it's private > > > >> (I guess Sima's reply didn't got into the mailing list) but the argument of > >> not having out of tree repo is due to historical bad experience of having to > >> sync the kernel with the code and it can become messy. > > > > My point is that even though the test strategy might be considered a > > "property" of the kernel, how you execute it is definitely not and you > > will have as many setups as you have CI farms. You can't put that into > > the kernel, just like we don't put the kernel command line in it for > > example. > > >>>>> > >>>>> * We cannot trust that the code being run is actually the one being > >>>>> pushed into gitlab > >> > >> > >> We can improve this if this is a requirement. > >> For DTS configuration we can work with overlays (which is the current > >> modification on that patchset). For other changes that are not suitable to > >> upstream (and should be rare) we can see if we work with the > >> `-external-fixes` approach or another approach, we can check it case by case > >> to understand why it is not suitable for upstream. > > > > The existence of that branch in itself is an issue to me. Again, it's a > > matter of trust. How can I trust a branch I barely know about, of which > > the development is not clear and isn't reviewed by any of the > > maintainers of the code that might affect the test outcomes. > > > > Or put another way, if I run the tests on my machine, it won't work. Why > > should it work on the CI farm? The branch itself is broken. It might not > > be due to any of the work I did, but it's broken still. > > > >>>>> > >>>>> * IMO, and I know we disagree here, any IGT test we enable for a given > >>>>> platform should work, period. Allowing failures and flaky tests just > >>>>> sweeps whatever issue is there under the rug. If the test is at > >>>>> fault, we should fix the test, if the driver / kernel is at fault, > >>>>> then I certainly want to know about it. > >> > >> I believe we need a baseline and understand the current status of tests. If > >> you check the xfails folder in the patch you can see that I had to add a few > >> tests on *-skips.txt since those tests crashes the system and other on > >> *-fails.txt that are consistently not passing. > > > > I agree that we need a baseline, but that baseline should be defined by > > the tests own merits, not their outcome on a particular platform. > > > > In other words, I want all drivers to follow that baseline, and if they > > don't it's a bug we should fix, and we should be vocal about it. We > > shouldn't ignore the test because it's broken. > > > > Going back to the example I used previously, kms_hdmi_inject@inject-4k > > shouldn't fail on mt8173, ever. That's a bug. Ignoring it and reporting > > that "all tests are good" isn't ok. There's something wrong with that > > driver and we should fix it. > > > > Or at the very least, explain in much details what is the breakage, how > > we noticed it, why we can't fix it, and how to reproduce it. > > > > Because in its current state, there's no chance we'll ever go over that > > test list and remove some of them. Or even know if, if we ever fix a bug > > somewhere, we should remove a flaky or failing test. > > > >> Since the "any IGT test we enable for a given platform should work" is not a > >> reality atm, > > > > Thanks for the reality check, but it's very much doable: we're in > > control of the test suite. > > > >> we need to have a clear view about which tests are not corresponding > >> to it, so we can start fixing. First we need to be aware of the issues > >> so we can start fixing them, otherwise we will stay in the "no tests > >> no failures" ground :) > > > > I think we have somewhat contradicting goals. You want to make > > regression testing, so whatever test used to work in the past should > > keep working. That's fine, but it's different from "expectations about > > what the DRM drivers are supposed to pass in the IGT test suite" which > > is about validation, ie "all KMS drivers must behave this way". > > I see. Indeed, for me it is more about regression testing. > > We could have a configuration where developers could choose to run > regression tests or overall validation (but I understand this is not the > point, but just saying we could have both somehow). > > We could have some policy: if you want to enable a certain device in the > CI, you need to make sure it passes all tests first to force people to > go fix the issues, but maybe it would be a big barrier. > > I'm afraid that, if a test fail (and it is a clear bug), people would > just say "work for most of the cases, this is not a priority to fix" and > just start ignoring the CI, this is why I think regression tests is a > good way to start with. > > Anyway, just brain storming :) > I really hope we can find a nice useful solution for the community. I think eventually we need to get to both goals, but currently driver and test quality just isn't remotely there. I think a good approach would be if CI work focuses on the pure sw tests first, so kunit and running igt against vgem/vkms. And then we could use that to polish a set of must-pass igt testcases, which also drivers in general are supposed to pass. Plus ideally weed out the bad igts that aren't reliable enough or have bad assumptions. For hardware I think it will take a very long time until we get to a point where CI can work without a test result list, we're nowhere close to that. But for virtual driver this really should be achievable, albeit with a huge amount of effort required to get there I think. Cheers, Sima > Regards, > Helen > > > > > I guess for regression you very much would like that all-green > > dashboard, and it's understandable. For validation, we don't care and we > > should be as vocal as possible to report broken drivers. > > > > Eventually, we should have regression testing over the validation test > > suite. > > > > It's not about reality. We should be clear what we expect from those > > test suites, and not claim that it's something it's not. > > > >>>> At least for display, where this also depends on peripheral hardware, > >>>> it's not an easy problem, really. > >>> > >>> Aside from the Chamelium tests, which tests actually rely on peripheral > >>> hardware? On EDID and hotplug, sure, but that can easily be set up from > >>> the userspace, or something like > >>> > >>> https://www.lindy-international.com/HDMI-2-0-EDID-Emulator.htm?websale8=ld0101.ld021102&pi=32115 > >>> > >>>> How reliable do you need it to be? How many nines? Who is going to > >>>> debug the issues that need hundreds or thousands of runs to reproduce? > >>>> If a commit makes some test less reliable, how long is it going to > >>>> take to even see that or pinpoint that? > >>> > >>> I mean, that's also true for failures or success then. How many times do > >>> you need a test to run properly to qualify it as a meaningful test? How > >>> do you know that it's not a flaky test? > >>> > >>> Ultimately, it's about trust. If, for a given test that just failed, I > >>> can't be certain that it's because of the branch I just submitted, I > >>> will just ignore the tests results after a while. > >>> > >>> This is already what plagues kernelci, and we should do better. > >> > >> This is something that is really nice on Mesa3D, a patch only gets merged if > >> tests passes, which forces people to not ignore it, which forces the code to > >> be fixed and the CI to be constantly maintained. > >> > >> Of course there are bad days there, but there is real value. Nice thread to > >> check: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8635 (thanks Alyssa > >> for the feedback). > > > > I'm sure it works great for Mesa, but I'm also sure it doesn't ignore > > CTS reports that a particular device isn't a valid OpenGL or Vulkan > > implementation anymore. > > > >>> And I'm sorry, but if some part of the kernel or driver just isn't > >>> reliable, then we shouldn't claim it is (except for all the times it > >>> isn't). If no-one has the time to look into it, fine, but flagging it > >>> under a flaky test doesn't help anyone. > >> > >> At least we would know what is there that isn't reliable. > > > > We would too if the test was reported as failed. But our preferred > > approach to do so diverge. > > > > Maxime -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch