----- "Ryan Harper" <ryanh@xxxxxxxxxx> wrote: > * Michael Goldish <mgoldish@xxxxxxxxxx> [2009-03-10 20:55]: > > > > ----- "Ryan Harper" <ryanh@xxxxxxxxxx> wrote: > > > > > > >- guest install wizard using md5sum region matching ... ouch. > This > > > is > > > > >quite fickle. I've seen different kvms generate different > md5sum > > > for > > > > >the same region a couple of times. I know distributing > screenshots > > > of > > > > >certain OSes is a grey area, but it would be nice to plumb > through > > > > >screenshot comparison and make that configurable. FWIW, I'll > > > probably > > > > >look at pulling the screenshot comparison bits from kvmtest > and > > > getting > > > > >that integrated in kvm_runtest_2. > > > > Creating a step file is not as easy as it seems, exactly for > that > > > reason. > > > > One has to pick a part of the screenshot that only available > when > > > input is > > > > expected and that would be consistent. We were thinking of > replacing > > > the > > > > md5sum with a tiny compressed image of the part of the image > that > > > was > > > > picked. > > > > > > It isn't just that step file creation isn't easy is that even with > a > > > good stepfile with smart region boxes, md5sum can *still* fail > > > because > > > KVM renders one pixel in the region differently than the system > where > > > the > > > original was created; this creates false positives failures. > > > > I'd like to comment on this. I don't doubt that some fuzzy matching > > algorithm (such as calculating match percentages) would generally > be > > more robust. I do however doubt it would significantly lower the > false > > positive rate in our case (which is fairly low already). False > > positive failures in step files are typically caused by: > > I've seen multiple failures during the windows guest installs which I > assume are well tested stepfiles. For example, 2k8 installs and the > fails to pass the barrier when trying to set the user password for > the > first time. The cropped region *looks* exactly like the the intended > location on the screendump, but md5sums to something different. > > A recent run of 2k3 and 2k8 installs resulted in the following > failures: > > Win2k3-32bit -- screenshot of "Windows Setup" and Setup is starting > windows, cropped region is of "Setup is starting Windows" full screen > dump matches this text from a human pov > > Win2k3-64-bit -- same as above > > Win2k8-32-bit -- screenshot of "The user's password must be changed > before logging in the first time with OK and cancel buttons. - > cropped > region is of the text "The user's password must be changed before > logging in the first time" - matching the full screen screendump fine > from a human POV > > Win2k8-64-bit -- same as above > > We've also been creating stepfiles for Linux guests as well that > aren't > here, various SLES and RHEL installs -- and I've repeatedly seen the > same issue where the cropped region *should* match but isn't, and it > isn't a result of any of the very correct reasons you've listed below > as > to why the stepfiles might fail. The Windows failures you're describing sound like they could be caused by a known KVM bug, which results in Windows installations sometimes booting from CDROM, instead of the HDD, immediately following the installation. I assume you don't have the stepmaker data of those Windows stepfiles. In that case, the images left by the stepfile test, scrdump.ppm and cropped_scrdump.ppm, are in fact the full screendump and a cropped region in it. They should always match perfectly, because the cropped one is generated from the full one at runtime. None of them reflects the expected guest behavior; they reflect what the stepfile test actually found. The only thing you have that reflects the expected guest behavior is the md5sum found in the stepfile. If you happened to keep the "debug" dirs which contain the screendumps and test logs, and could somehow send them to me or Uri, I'd be able to tell you what went wrong with the test and whether it is indeed that KVM bug or a stepfile error. We probably could also use the stepfiles you were working with, because we might have changed ours recently, though that is unlikely because we don't change old stepfiles very often nowadays. Regarding the stepfiles you created for Linux -- I can't help much with those since I don't have the data. I do believe that if I had the data and the stepfiles I could quickly identify the problem, so if you think those can be sent to us, I'd like to have them. I'm not sure exactly what version of kvm_runtest_2 you're using (are you are using kvm_runtest_2?), but I think it should support automatic comparison of the actual screendump with the expected screendump. If you have a slightly older version than the current git HEAD, then you should probably place your <stepfile>_data directory right next to <stepfile>, and whenever a stepfile test fails you'll get -- in addition to scrdump.ppm and cropped_scrdump.ppm -- scrdump_reference.ppm and cropped_scrdump_reference.ppm, as well as a nice green-red comparison image which colors all matching pixels green and all mismatching ones red. That last image is very helpful when stepfiles require fixing. If you have the latest git HEAD, you should place all your <stepfile>_data dirs in a dir named "steps_data" which should reside next to "steps" (which should contain the stepfiles themselves). > > - an unexpected popup window covering the test region > > - a dialog which has a different position every time (and varies by > > many pixels) > > - a barrier that passes before the controls get input focus, which > > causes the following keystrokes to have no effect > > - in text mode, sometimes a line of text is printed unexpectedly > and > > causes the entire screen to scroll up > > - addition/modification of a KVM feature which changes the course > of > > the installation > > > > > I may have left something out. In any case, all these problems are > > solved by picking better barrier regions, but none can be solved by > > using a more forgiving comparison method. I have encountered a > single > > installation that rendered a single pixel in an indeterministic > > fashion, and though this problem was easily solved by correcting > the > > barrier (using a stepfile editor), I do agree we might get a small > > decrease in the false positive rate if we use a more forgiving > > algorithm. > > Well, either there is a *bug* right now that is triggering a higher > rate > of false positives, or using a better algorithm is a requirement; > distributing stepfiles and md5sums that don't work isn't productive, > so > in the case that it is a bug I still suggest we pursue a more > resilient > algorithm. Do the Windows tests you mentioned fail consistently, or have you witnessed any of them succeed in some of the runs? > > However, there is also a risk: a more forgiving algorithm may > forgive > > KVM for rendering errors. It may also make it risky to pick > barriers > > that are meant to catch small text; I believe a button with a "Yes" > > caption and a button with a "No" caption would have a very high > match > > percentage, especially if you have to pick the whole button, or > maybe > > even some of its surroundings (and you often do). > > Noted, though I think as you indicated above, smart selection of the > cropped region goes a long way toward avoiding these sorts of > collisions. > > > > > I still believe it's a good idea to look into other methods (we're > > already doing that) and start experimenting with them. > > Cool. Obviously without the original ppm files from the stepmaker > run, > we can't determine if a different algo would help so we're generating > new stepfiles and ppm data and trying to reproduce the md5sum > mismatch > issues. If there is anything I can do to help with the algo work let > me > know. Thanks, I certainly will. I also appreciate your help so far. Michael -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html