Re: [PATCH 1/2] drmtest: introduce kmsg_error functions

Paulo Zanoni <przanoni@xxxxxxxxx> · Mon, 11 Nov 2013 16:18:04 -0200

2013/11/11 Daniel Vetter <daniel@xxxxxxxx>:
> On Mon, Nov 11, 2013 at 03:06:09PM -0200, Paulo Zanoni wrote:
>> From: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
>>
>> These functions should help you checking for new Kernel error
>> messages. One of the problems I had while writing the runtime PM test
>> suite is that when you read the sysfs and debugfs files, the only way
>> to detect errors is by checking dmesg, so I was always getting SUCCESS
>> even if the test caught a bug. Also, we have so many debugfs/sysfs
>> files that it was not easy to discover which file caused the error
>> messages I was seeing. So this commit adds some infrastructure to
>> allow us to automatically check for new errors on dmesg.
>>
>> Use it like this:
>>
>> int main(int argc, char *argv[]) {
>>       int fd, i;
>>
>>       igt_fixture
>>               fd = kmsg_error_setup();
>>
>>       igt_subtest("t1") {
>>               kmsg_error_reset(fd);
>>               do_something();
>>               kmsg_error_detect("");
>>       }
>>
>>       igt_subtest("t2") {
>>               for (i = 0; i < 10; i++) {
>>                       char *file_name = get_file(i);
>>                       kmsg_error_reset(fd);
>>                       process_file(file_name);
>>                       kmsg_error_detect(file_name):
>>               }
>>       }
>>
>>       igt_fixture
>>               kmsg_error_teardown(fd);
>> }
>
> Imo that's the wrong approach. _Every_ test should fail if we end up with
> errors/backtraces in dmesg.

That's exactly why I wrote code to check dmesg! We could, in the
future, make the igt_subtest macros call this code automatically.

> And if you look for very specific stuff (like
> gpu hang or missed irq warnings) the approach thus far has been to expose
> that information somehow through debugfs files.

So you're suggesting I should create some sort of debugfs interface to
expose every single WARN our driver does? Doesn't really sound like a
good idea, unless we invent our our I915_WARN, I915_ERROR, etc.. And
we still won't catch the WARNs and ERRORs spit by drm.ko or anything
outside i915.ko.

> That way we're independent
> from the exact string used in the kernel output.

ZZ_check_dmesg already parses dmesg strings, I just copied it. Also,
the whole IGT already relies way too much on being ran against
very-recent libdrm/Kernels, we're just adding one more dependency. And
we can also add the newer strings if somebody ever changes the WARN or
DRM_ERROR output: it's not like our code will completely break, it
just won't be as good. And we always require everybody to use IGT git
master anyway. I don't see the problem.

>
> I think the right approach is to add this to the test runner, i.e. piglit.
> There's already very basic support to capture the (new) dmesg output for
> each test with the --dmesg option. Have you played around with that and
> tried to extend it to your liking?

My goal is that I want to know, inside a test program, which line of
code introduced the dmesg error, and if we use some sort of external
approach like what you're suggesting this won't be possible. I have
code that opens hundreds of sysfs and debugfs files, and I want to
check dmesg after I open/close every single file, to be able to detect
which one exactly causes the problem. I'm already using this locally
and it *really* saved a lot of time for me. If we don 't accept this
code inside drmtest.c, I'm gonna ask if I can push it directly to
pm_pc8.c.

> -Daniel
>
>
>
>>
>> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
>> ---
>>  lib/drmtest.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  lib/drmtest.h | 13 ++++++++++
>>  2 files changed, 92 insertions(+)
>>
>> diff --git a/lib/drmtest.c b/lib/drmtest.c
>> index d8fc60f..ebabfd8 100644
>> --- a/lib/drmtest.c
>> +++ b/lib/drmtest.c
>> @@ -41,6 +41,7 @@
>>  #include <linux/kd.h>
>>  #include <unistd.h>
>>  #include <sys/wait.h>
>> +#include <poll.h>
>>  #include "drm_fourcc.h"
>>
>>  #include "drmtest.h"
>> @@ -2307,3 +2308,81 @@ void igt_system_suspend_autoresume(void)
>>       ret = system("rtcwake -s 30 -m mem");
>>       igt_assert(ret == 0);
>>  }
>> +
>> +/* The kmsg error detecting functions allow you to catch new error messages from
>> + * the Kernel. The idea is that you have to call kmsg_error_setup() only once at
>> + * the beginning, and kmsg_error_teardown() at the end. And for every subtest,
>> + * you run kmsg_error_reset() at the begin and kmsg_error_detect() at the end:
>> + * this should make sure that kmsg_error_detect() will only catch the error
>> + * messages that were introduced after the subtest started. */
>> +int kmsg_error_setup(void)
>> +{
>> +     int fd = open("/dev/kmsg", O_RDONLY);
>> +
>> +     igt_assert_f(fd >= 0, "Can't open /dev/kmsg\n");
>> +     return fd;
>> +}
>> +
>> +void kmsg_error_teardown(int fd)
>> +{
>> +     close(fd);
>> +}
>> +
>> +/* You have to call this before running your subtest, so that the next time you
>> + * call kmsg_error_detect you'll only look at the new kmsg lines. */
>> +void kmsg_error_reset(int fd)
>> +{
>> +     lseek(fd, 0, SEEK_END);
>> +}
>> +
>> +static void kmsg_error_line_parse(const char *line, const char *error_msg)
>> +{
>> +     igt_assert_f(strstr(line, "------------[ cut here ]------------") == 0,
>> +                  "%s\n", error_msg);
>> +     igt_assert_f(strstr(line, "*ERROR*") == 0, "%s\n", error_msg);
>> +     igt_assert_f(strstr(line, "BUG: sleeping function called from invalid context at") == 0,
>> +                  "%s\n", error_msg);
>> +}
>> +
>> +/* Keep reading the Kernel ring buffer and checking for errors. Stop reading if
>> + * there's nothing new on the buffer after the timeout. Notice that every time
>> + * you call this function, the time it will take to return will always be >= the
>> + * timeout. */
>> +void kmsg_error_detect(int fd, int timeout_ms, const char *error_msg)
>> +{
>> +     int i, rc;
>> +     int line_size = 128, line_used = 0, buf_size = 128;
>> +     char buf[buf_size];
>> +     char *line;
>> +     struct pollfd pfd;
>> +     ssize_t readed;
>> +
>> +     line = malloc(sizeof(char) * line_size);
>> +     igt_assert(line);
>> +
>> +     pfd.fd = fd;
>> +     pfd.events = POLLIN | POLLPRI;
>> +
>> +     while (1) {
>> +             pfd.revents = 0;
>> +             rc = poll(&pfd, 1, timeout_ms);
>> +             if (!rc)
>> +                     break;
>> +
>> +             readed = read(fd, &buf, buf_size - 1);
>> +             for (i = 0; i < readed; i++) {
>> +                     if (line_used + 1 >= line_size) {
>> +                             line = realloc(line, line_size * 2);
>> +                             line_size *= 2;
>> +                             igt_assert(line);
>> +                     }
>> +                     line[line_used++] = buf[i];
>> +                     if (buf[i] == '\n') {
>> +                             line[line_used] = '\0';
>> +                             kmsg_error_line_parse(line, error_msg);
>> +                             line_used = 0;
>> +                     }
>> +             }
>> +     }
>> +     free(line);
>> +}
>> diff --git a/lib/drmtest.h b/lib/drmtest.h
>> index a9fd0bc..05e3629 100644
>> --- a/lib/drmtest.h
>> +++ b/lib/drmtest.h
>> @@ -413,4 +413,17 @@ void igt_enable_prefault(void);
>>  /* suspend and auto-resume system */
>>  void igt_system_suspend_autoresume(void);
>>
>> +/* Returns the fd for the other functions. */
>> +int kmsg_error_setup(void);
>> +/* Closes the fd */
>> +void kmsg_error_teardown(int fd);
>> +/* Skip to the end of the kmsg fd, so that the next time you call
>> + * kmsg_error_detect() you will only parse the new messages. */
>> +void kmsg_error_reset(int fd);
>> +/* Reads all the new Kernel messages since the last time you called
>> + * kmsg_error_teardown, and also waits for new messages for timeout_ms. In case
>> + * we find any error we'll also print error_msg. Uses igt_assert, so no need to
>> + * check for return values. */
>> +void kmsg_error_detect(int fd, int timeout_ms, const char *error_msg);
>> +
>>  #endif /* DRMTEST_H */
>> --
>> 1.8.3.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Paulo Zanoni
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx