On Tue, 2016-07-05 at 22:17 +0300, Mohammed Tayeh wrote: > Hi > We need some to explain how to use openQA for me and new member > in a short steps 😀😀 Hi Mohamed! :) openQA isn't exactly something you 'use' in most cases. It's an automated test system that we've been using for the last couple of cycles; the main goal was to reduce the manual release validation testing workload. So all the tests we've implemented in openQA so far are automated versions of the release validation test cases - the same tests you see linked from the release validation pages, like: https://fedoraproject.org/wiki/Test_Results:Fedora_25_Rawhide_20160704.n.0_Installation every result you see there from 'coconut' with the bot icon was actually produced by openQA. The way it's set up at present is that every time releng produces a compose - whether it's a nightly compose or a candidate of any kind - all the openQA tests will be run for it. Each time an openQA test passses, a little intermediary between openQA and the wiki looks to see if that compose was 'nominated' for testing - i.e., if there are validation test pages on the Wiki for it - and converts the openQA result to one or more 'pass' wiki results, if so. So ultimately, openQA saves us running those tests manually. That was all we initially intended openQA to do, but since it runs on every compose we took the opportunity to build out a couple of other things around it. You've probably seen the 'compose check report' emails that are sent to this list every time a compose is built: those list all failed openQA tests for the compose. The idea there was just to give people a convenient way to see roughly how well each day's compose is working - if lots and lots of tests failed, obviously it's pretty bad and you might want to avoid using it. There's also the 'nightly compose finder' I wrote last cycle: https://www.happyassassin.net/nightlies.html the point of that is just to provide a convenient way to find the most recent compose of each image, and when possible also a way to find the most recent compose of each image that passed all its tests. It takes test results from both openQA and autocloud, which is a separate automated testing system for cloud images. So a lot of what openQA's intended to do is just sort of sit there and run tests and provide us with information in various ways; you don't have to 'use' it exactly. But you *can* interact with it for a few reasons. The most obvious is to look at the test results directly - where you get a lot more detail than the email reports or the wiki or the nightly finder - and when a test failed, figure out why and file a bug. :) Me and Jan Sedlak already try to keep on top of this, but of course if anyone else wants to learn how to do it and help us out that'd be great. Here's a quick starter guide! The main starter pages in openQA (for me) are the overviews of results for a single compose. For instance, here's the overview for today's Rawhide nightly: https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=Rawhide&build=Fedora-Rawhide-20160705.n.0&groupid=1 you can find the last three composes from the front page, and you can click 'fedora' on the front page to get several more before that: https://openqa.fedoraproject.org/group_overview/1 if you want to find the overview for an even older compose you can edit the URL for a newer compose and just change the compose ID. Let's go back to the overview for today's Rawhide nightly. You'll see several tables, with titles like 'Flavor: Atomic-boot-iso'. The 'flavors' are basically the different images; we have image-specific tests for several different images. There is also a special 'universal' flavor which contains tests that can be run (more or less) on any installer image, these are usually run on the Server DVD but will fall back to another image if that one isn't available. In each table you'll see a row for each test that's run for that 'flavor', with columns for each arch (we currently only run openQA on i386 and x86_64). Some tests are run on x86_64 BIOS and UEFI; the UEFI test has '@uefi' appended to its name. For each arch that each test is run on (not every arch runs on every test) you'll see a colored circle. The color of the circle represents the state or result of the test. Note these colors actually changed a bit with the update today - I'll tell you the new colors, not the old ones: * Dark blue means the test is scheduled to run but hasn't started yet * Light blue means it's running right now * Green means it finished and passed * Orange means it finished and 'soft failed' (which is more or less like a 'warn' on the wiki - the test more or less passed but did run into a non-fatal bug along the way, e.g. right now the F24->F25 upgrade tests 'soft fail' because they have to pass enforcing=0 to work around https://bugzilla.redhat.com/show_bug.cgi?id=1349721 ;) * Red means it failed * Dark red means it couldn't even run at all (usually because we messed up the disk images or something, you should rarely see this in prod) * Grey means it was skipped for some reason, usually this happens when it depends on another test which failed Clicking on the circle takes you to the detailed page for that specific test (or 'job' in openQA terms). Let's look at a failed test: https://openqa.fedoraproject.org/tests/24725 So how do we figure out what went wrong? Well, it helps to know roughly how openQA works. Very simply, what openQA does is run through a sequence of pre-planned actions - key presses and mouse movements - and checks every so often that the screen looks the way it should at this point in the process. Every time one of these screen matches passes or fails, it takes a screenshot. In this view, you see a bunch of thumbnails. A thumbnail with a green surround is a *passed* match. A thumbnail with a red surround is a *failed* match. A thumbnail with a grey surround doesn't represent a match but was taken for some other reason (openQA will take these 'informational' screenshots every so often as it goes along, there are various conditions as to why). Usually, when you're looking at a failed test, you'll see a red match somewhere. Here we can see it in the _do_install_and_reboot test: https://openqa.fedoraproject.org/tests/24725#step/_do_install_and_reboot/33 In this case it's pretty obvious what's gone wrong, as the installer's showing an error message. But sometimes it'll be less obvious. The "Candidate needle" drop-down lets you see what openQA was expecting to see at this point and compare it to what it's actually seeing: you can pick any of the 'needles' (the reference screens) that openQA was looking for, and compare them to what's actually on screen. In this case openQA was expecting the 'install complete' screen to show up at some point, only it never did, because there was an error installing the bootloader. So it eventually just times out and gives up. So OK, now we know the install failed at the point of trying to write the UEFI boot loader. Cool! This is already good information. But we can get more. Look up at near the top of the screen and you'll see there are a few tabs on this 'job' view - we're on the Details tab, but there are also Logs & Assets, Settings, Comments and Previous results. Logs & Assets is the really useful one here. So let's go there. For *any* test that actually managed to run, you'll get a few things. vars.json is the openQA settings variables that were set for this test (I think at the time it failed), this isn't often super useful (mostly for diagnosing broken tests). serial0.txt is the log of the serial output (openQA uses this for various things; it's the main channel for getting analyzable text into and out of the test system). autoinst- log.txt is basically openQA's log of the actual test process, it's very very verbose and can be hard to read but it provides all the nitty- gritty details on what openQA was actually *doing*, what screens it was looking for and what it was typing and clicking and where it was moving the mouse and so on. Most obviously useful is the Video. Yup, for every single test, there's a (substantially sped-up) video recording you can watch, which is obviously really useful for figuring out what actually happened. For some tests - like this one - you'll also find uploaded files from the test system (these are labelled 'Uploaded Logs' but they don't have to be logs, tests can be set to upload *any* file from the test box). Our tests are set up such that when an install test fails, openQA will try to go to a console and upload all the anaconda logs, plus /var/log and /var/tmp (where anaconda crash tracebacks go). So we can actually read the installer logs from the test! In this case I happen to know that program.log is usually the most useful in diagnosing bootloader install fails, so I can go look at it: https://openqa.fedoraproject.org/tests/24725/file/_do_install_and_reboot-program.log and down the bottom we see the actual errors: 06:32:34,412 INFO program: Running... efibootmgr 06:32:34,586 DEBUG program: Return code: -11 06:32:34,587 INFO program: Running... efibootmgr -c -w -L Fedora -d /dev/vda -p 1 -l \EFI\fedora\shim.efi 06:32:34,674 DEBUG program: Return code: -11 those efibootmgr calls should be returning 0 (return code 0 always means 'success', anything non-0 is bad). That's the problem here. At this point we can use another of the tabs on the detailed job view, the 'Previous Results' tab. This is really useful because it often lets you pinpoint exactly when something broke, which is obviously a big help in fixing it. So if we look at that tab, we can see the test failed at the same point for each of the last four days, but it worked fine on 20160630. So we now know (or at least strongly suspect) that something that changed between Fedora-Rawhide-20160630.n.0 and Fedora- Rawhide-20160701.n.0 is what broke this. At that point I can go look at the 'Rawhide report' email for Fedora- Rawhide-20160701.n.0: https://lists.fedoraproject.org/archives/list/test@xxxxxxxxxxxxxxxxxxxxxxx/message/BS5XIER32BKG3BR6KPPZYJKANTT6QJLE/ and hey, look at that, I see this package change: Package: efivar-0.24-1.fc25 Old package: efivar-0.23-1.fc24 Summary: Tools to manage UEFI variables RPMs: efivar efivar-devel efivar-libs Size: 228452 bytes Size change: 1600 bytes Changelog: * Thu Jun 30 2016 Peter Jones <pjones@xxxxxxxxxx> - 0.24-1 - Update to 0.24 That sure sounds like it might be related, huh? So now I can go file a bug that tells the packager: * UEFI installs started failing on 2016-07-01 * Here's the efibootmgr messages from the log * This efivar bump sure looks like it might be the cause ...and in fact that's exactly what I did: https://bugzilla.redhat.com/show_bug.cgi?id=1352680 and once pjones is back from vacation, he'll fix it. :) There are various other details and things, but that's the basic process of looking at an openQA test and figuring out what went wrong. Please do ask if you have any follow-up questions! The other thing you can interact with openQA for is to actually write or modify the tests it runs, which is a whole other topic :) -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net -- test mailing list test@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe: https://lists.fedoraproject.org/admin/lists/test@xxxxxxxxxxxxxxxxxxxxxxx