Hi, On Tue, Mar 12, 2019 at 11:58:10AM +0100, Daniel Vetter wrote: > On Mon, Mar 11, 2019 at 11:33:15PM +0100, Noralf Trønnes wrote: > > > > > > Den 11.03.2019 20.23, skrev Daniel Vetter: > > > On Mon, Mar 11, 2019 at 06:42:16PM +0100, Noralf Trønnes wrote: [...] > > >> +} > > >> + > > >> +static struct kmsg_dumper drm_panic_kmsg_dumper = { > > >> + .dump = drm_panic_kmsg_dump, > > >> + .max_reason = KMSG_DUMP_PANIC, > > >> +}; > > >> + > > >> +static ssize_t drm_panic_file_panic_write(struct file *file, > > >> + const char __user *user_buf, > > >> + size_t count, loff_t *ppos) > > >> +{ > > >> + unsigned long long val; > > >> + char buf[24]; > > >> + size_t size; > > >> + ssize_t ret; > > >> + > > >> + size = min(sizeof(buf) - 1, count); > > >> + if (copy_from_user(buf, user_buf, size)) > > >> + return -EFAULT; > > >> + > > >> + buf[size] = '\0'; > > >> + ret = kstrtoull(buf, 0, &val); > > >> + if (ret) > > >> + return ret; > > >> + > > >> + drm_panic_kmsg_dumper.max_reason = KMSG_DUMP_OOPS; > > >> + wmb(); > > >> + > > >> + /* Do a real test with: echo c > /proc/sysrq-trigger */ > > >> + > > >> + if (val == 0) { > > >> + pr_info("Test panic screen using kmsg_dump(OOPS)\n"); > > >> + kmsg_dump(KMSG_DUMP_OOPS); > > >> + } else if (val == 1) { > > >> + char *null_pointer = NULL; > > >> + > > >> + pr_info("Test panic screen using NULL pointer dereference\n"); > > >> + *null_pointer = 1; > > >> + } else { > > >> + return -EINVAL; > > >> + } >> > > > > > This isn't quite what I had in mind, since it still kills the kernel (like > > > sysrq-trigger). > > > > If val == 0, it doesn't kill the kernel, it only dumps the kernel log. > > And it doesn't taint the kernel either. > > Ah I didn't realize that. Sounds like a good option to keep. > > > > Instead what I had in mind is to recreate the worst > > > possible panic context as much as feasible (disabling interrupts should be > > > a good start, maybe we can even do an nmi callback), and then call our > > > panic implementation. That way we can test the panic handler in a > > > non-destructive way (i.e. aside from last dmesg lines printed to the > > > screen nothing bad happens to the kernel: No real panic, no oops, no > > > tainting). > > > > The interrupt case I can do, nmi I have no idea. > > I just read the printk nmi code again and it looks like there's now even > more special handling for issues happening in nmi context, so we should > never see an oops from nmi context. So local_irq_disable() wrapping should > be good enough for testing. > -Daniel > Hmmm, John is on a mission to change that soon: - https://lore.kernel.org/lkml/20190212143003.48446-1-john.ogness@xxxxxxxxxxxxx - https://lwn.net/Articles/780556 A v2 is on the way, and there's a lot of interest from different groups, including RT, to get that in. So I guess it would be nice to cover NMI contexts from the start, not necessarily from a non-destructive CI testing perspective, but at least while doing normal tests and code reviews. (Is it possible to do non-destructive NMI tests? I don't know..) thanks! -- darwi http://darwish.chasingpointers.com _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx