2013/11/11 Daniel Vetter <daniel@xxxxxxxx>: > On Mon, Nov 11, 2013 at 03:06:09PM -0200, Paulo Zanoni wrote: >> From: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx> >> >> These functions should help you checking for new Kernel error >> messages. One of the problems I had while writing the runtime PM test >> suite is that when you read the sysfs and debugfs files, the only way >> to detect errors is by checking dmesg, so I was always getting SUCCESS >> even if the test caught a bug. Also, we have so many debugfs/sysfs >> files that it was not easy to discover which file caused the error >> messages I was seeing. So this commit adds some infrastructure to >> allow us to automatically check for new errors on dmesg. >> >> Use it like this: >> >> int main(int argc, char *argv[]) { >> int fd, i; >> >> igt_fixture >> fd = kmsg_error_setup(); >> >> igt_subtest("t1") { >> kmsg_error_reset(fd); >> do_something(); >> kmsg_error_detect(""); >> } >> >> igt_subtest("t2") { >> for (i = 0; i < 10; i++) { >> char *file_name = get_file(i); >> kmsg_error_reset(fd); >> process_file(file_name); >> kmsg_error_detect(file_name): >> } >> } >> >> igt_fixture >> kmsg_error_teardown(fd); >> } > > Imo that's the wrong approach. _Every_ test should fail if we end up with > errors/backtraces in dmesg. That's exactly why I wrote code to check dmesg! We could, in the future, make the igt_subtest macros call this code automatically. > And if you look for very specific stuff (like > gpu hang or missed irq warnings) the approach thus far has been to expose > that information somehow through debugfs files. So you're suggesting I should create some sort of debugfs interface to expose every single WARN our driver does? Doesn't really sound like a good idea, unless we invent our our I915_WARN, I915_ERROR, etc.. And we still won't catch the WARNs and ERRORs spit by drm.ko or anything outside i915.ko. > That way we're independent > from the exact string used in the kernel output. ZZ_check_dmesg already parses dmesg strings, I just copied it. Also, the whole IGT already relies way too much on being ran against very-recent libdrm/Kernels, we're just adding one more dependency. And we can also add the newer strings if somebody ever changes the WARN or DRM_ERROR output: it's not like our code will completely break, it just won't be as good. And we always require everybody to use IGT git master anyway. I don't see the problem. > > I think the right approach is to add this to the test runner, i.e. piglit. > There's already very basic support to capture the (new) dmesg output for > each test with the --dmesg option. Have you played around with that and > tried to extend it to your liking? My goal is that I want to know, inside a test program, which line of code introduced the dmesg error, and if we use some sort of external approach like what you're suggesting this won't be possible. I have code that opens hundreds of sysfs and debugfs files, and I want to check dmesg after I open/close every single file, to be able to detect which one exactly causes the problem. I'm already using this locally and it *really* saved a lot of time for me. If we don 't accept this code inside drmtest.c, I'm gonna ask if I can push it directly to pm_pc8.c. > -Daniel > > > >> >> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx> >> --- >> lib/drmtest.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> lib/drmtest.h | 13 ++++++++++ >> 2 files changed, 92 insertions(+) >> >> diff --git a/lib/drmtest.c b/lib/drmtest.c >> index d8fc60f..ebabfd8 100644 >> --- a/lib/drmtest.c >> +++ b/lib/drmtest.c >> @@ -41,6 +41,7 @@ >> #include <linux/kd.h> >> #include <unistd.h> >> #include <sys/wait.h> >> +#include <poll.h> >> #include "drm_fourcc.h" >> >> #include "drmtest.h" >> @@ -2307,3 +2308,81 @@ void igt_system_suspend_autoresume(void) >> ret = system("rtcwake -s 30 -m mem"); >> igt_assert(ret == 0); >> } >> + >> +/* The kmsg error detecting functions allow you to catch new error messages from >> + * the Kernel. The idea is that you have to call kmsg_error_setup() only once at >> + * the beginning, and kmsg_error_teardown() at the end. And for every subtest, >> + * you run kmsg_error_reset() at the begin and kmsg_error_detect() at the end: >> + * this should make sure that kmsg_error_detect() will only catch the error >> + * messages that were introduced after the subtest started. */ >> +int kmsg_error_setup(void) >> +{ >> + int fd = open("/dev/kmsg", O_RDONLY); >> + >> + igt_assert_f(fd >= 0, "Can't open /dev/kmsg\n"); >> + return fd; >> +} >> + >> +void kmsg_error_teardown(int fd) >> +{ >> + close(fd); >> +} >> + >> +/* You have to call this before running your subtest, so that the next time you >> + * call kmsg_error_detect you'll only look at the new kmsg lines. */ >> +void kmsg_error_reset(int fd) >> +{ >> + lseek(fd, 0, SEEK_END); >> +} >> + >> +static void kmsg_error_line_parse(const char *line, const char *error_msg) >> +{ >> + igt_assert_f(strstr(line, "------------[ cut here ]------------") == 0, >> + "%s\n", error_msg); >> + igt_assert_f(strstr(line, "*ERROR*") == 0, "%s\n", error_msg); >> + igt_assert_f(strstr(line, "BUG: sleeping function called from invalid context at") == 0, >> + "%s\n", error_msg); >> +} >> + >> +/* Keep reading the Kernel ring buffer and checking for errors. Stop reading if >> + * there's nothing new on the buffer after the timeout. Notice that every time >> + * you call this function, the time it will take to return will always be >= the >> + * timeout. */ >> +void kmsg_error_detect(int fd, int timeout_ms, const char *error_msg) >> +{ >> + int i, rc; >> + int line_size = 128, line_used = 0, buf_size = 128; >> + char buf[buf_size]; >> + char *line; >> + struct pollfd pfd; >> + ssize_t readed; >> + >> + line = malloc(sizeof(char) * line_size); >> + igt_assert(line); >> + >> + pfd.fd = fd; >> + pfd.events = POLLIN | POLLPRI; >> + >> + while (1) { >> + pfd.revents = 0; >> + rc = poll(&pfd, 1, timeout_ms); >> + if (!rc) >> + break; >> + >> + readed = read(fd, &buf, buf_size - 1); >> + for (i = 0; i < readed; i++) { >> + if (line_used + 1 >= line_size) { >> + line = realloc(line, line_size * 2); >> + line_size *= 2; >> + igt_assert(line); >> + } >> + line[line_used++] = buf[i]; >> + if (buf[i] == '\n') { >> + line[line_used] = '\0'; >> + kmsg_error_line_parse(line, error_msg); >> + line_used = 0; >> + } >> + } >> + } >> + free(line); >> +} >> diff --git a/lib/drmtest.h b/lib/drmtest.h >> index a9fd0bc..05e3629 100644 >> --- a/lib/drmtest.h >> +++ b/lib/drmtest.h >> @@ -413,4 +413,17 @@ void igt_enable_prefault(void); >> /* suspend and auto-resume system */ >> void igt_system_suspend_autoresume(void); >> >> +/* Returns the fd for the other functions. */ >> +int kmsg_error_setup(void); >> +/* Closes the fd */ >> +void kmsg_error_teardown(int fd); >> +/* Skip to the end of the kmsg fd, so that the next time you call >> + * kmsg_error_detect() you will only parse the new messages. */ >> +void kmsg_error_reset(int fd); >> +/* Reads all the new Kernel messages since the last time you called >> + * kmsg_error_teardown, and also waits for new messages for timeout_ms. In case >> + * we find any error we'll also print error_msg. Uses igt_assert, so no need to >> + * check for return values. */ >> +void kmsg_error_detect(int fd, int timeout_ms, const char *error_msg); >> + >> #endif /* DRMTEST_H */ >> -- >> 1.8.3.1 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Paulo Zanoni _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx