Den 12.03.2019 23.13, skrev Ahmed S. Darwish: > Hi, > > [[ CCing John for the trylock parts ]] > > On Mon, Mar 11, 2019 at 11:33:15PM +0100, Noralf Trønnes wrote: >> >> >> Den 11.03.2019 20.23, skrev Daniel Vetter: >>> On Mon, Mar 11, 2019 at 06:42:16PM +0100, Noralf Trønnes wrote: >>>> This adds support for outputting kernel messages on panic(). >>>> A kernel message dumper is used to dump the log. The dumper iterates >>>> over each DRM device and it's crtc's to find suitable framebuffers. >>>> >>>> All the other dumpers are run before this one except mtdoops. >>>> Only atomic drivers are supported. >>>> >>>> Signed-off-by: Noralf Trønnes <noralf@xxxxxxxxxxx> >>> >>> Bunch of comments/ideas for you or Darwish below, whoever picks this up. >> >> Actually it would ne nice if Darwish could pick it up since he will do >> it on i915 which will be useful to a much broader audience. >> If not I'll respin when I'm done with the drm_fb_helper refactoring. >> > > Yup, I'll be more than happy to do this.. Thanks for doing that. Noralf. > while preserving all of > Noralf's authorship and copyright notices of course. > > I guess it can be: > > - Handle the comments posted by Daniel and others (I'll post > some questions too) > > - Add the necessary i915 specific bits > > - Test, post v3/v4/../vn. Rinse and repeat. Keep it local at > dri-devel until getting the necessary S-o-Bs. > > - Post to wider audience (some feedback from distribution folks > would also be nice, before posting to lkml) > > More comments below.. > > [...] > >>>> + >>>> +static void drm_panic_kmsg_dump(struct kmsg_dumper *dumper, >>>> + enum kmsg_dump_reason reason) >>>> +{ >>>> + class_for_each_device(drm_class, NULL, dumper, drm_panic_dev_iter); >>> >>> class_for_each_device uses klist, which only uses an irqsave spinlock. I >>> think that's good enough. Comment to that effect would be good e.g. >>> >>> /* based on klist, which uses only a spin_lock_irqsave, which we >>> * assume still works */ >>> >>> If we aim for perfect this should be a trylock still, maybe using our own >>> device list. >>> > > I definitely agree here. > > The lock may already be locked either by a stopped CPU, or by the > very same CPU we execute panic() on (e.g. NMI panic() on the > printing CPU). > > This is why it's very common for example in serial consoles, which > are usually careful about re-entrance and panic contexts, to do: > > xxxxxx_console_write(...) { > if (oops_in_progress) > locked = spin_trylock_irqsave(&port->lock, flags); > else > spin_lock_irqsave(&port->lock, flags); > } > > I'm quite positive we should do the same for panic drm drivers. > John? > >>>> +} >>>> + >>>> +static struct kmsg_dumper drm_panic_kmsg_dumper = { >>>> + .dump = drm_panic_kmsg_dump, >>>> + .max_reason = KMSG_DUMP_PANIC, >>>> +}; >>>> + >>>> +static ssize_t drm_panic_file_panic_write(struct file *file, >>>> + const char __user *user_buf, >>>> + size_t count, loff_t *ppos) >>>> +{ >>>> + unsigned long long val; >>>> + char buf[24]; >>>> + size_t size; >>>> + ssize_t ret; >>>> + >>>> + size = min(sizeof(buf) - 1, count); >>>> + if (copy_from_user(buf, user_buf, size)) >>>> + return -EFAULT; >>>> + >>>> + buf[size] = '\0'; >>>> + ret = kstrtoull(buf, 0, &val); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + drm_panic_kmsg_dumper.max_reason = KMSG_DUMP_OOPS; >>>> + wmb(); >>>> + >>>> + /* Do a real test with: echo c > /proc/sysrq-trigger */ >>>> + >>>> + if (val == 0) { >>>> + pr_info("Test panic screen using kmsg_dump(OOPS)\n"); >>>> + kmsg_dump(KMSG_DUMP_OOPS); >>>> + } else if (val == 1) { >>>> + char *null_pointer = NULL; >>>> + >>>> + pr_info("Test panic screen using NULL pointer dereference\n"); >>>> + *null_pointer = 1; >>>> + } else { >>>> + return -EINVAL; >>>> + } >>> >>> This isn't quite what I had in mind, since it still kills the kernel (like >>> sysrq-trigger). >> >> If val == 0, it doesn't kill the kernel, it only dumps the kernel log. >> And it doesn't taint the kernel either. >> >>> Instead what I had in mind is to recreate the worst >>> possible panic context as much as feasible (disabling interrupts should be >>> a good start, maybe we can even do an nmi callback), and then call our >>> panic implementation. That way we can test the panic handler in a >>> non-destructive way (i.e. aside from last dmesg lines printed to the >>> screen nothing bad happens to the kernel: No real panic, no oops, no >>> tainting). >> >> The interrupt case I can do, nmi I have no idea. >> > > I agree too. Disabling interrupts + CONFIG_DEBUG_ATOMIC_SLEEP > would be a nice non-destructive test-case emulation. > > thanks! > > -- > darwi > http://darwish.chasingpointers.com > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx