Re: [PATCH v2 1/3] drm: Add support for panic message output

Noralf Trønnes <noralf@xxxxxxxxxxx> · Wed, 13 Mar 2019 11:24:30 +0100

Den 12.03.2019 23.13, skrev Ahmed S. Darwish:
> Hi,
> 
> [[ CCing John for the trylock parts ]]
> 
> On Mon, Mar 11, 2019 at 11:33:15PM +0100, Noralf Trønnes wrote:
>>
>>
>> Den 11.03.2019 20.23, skrev Daniel Vetter:
>>> On Mon, Mar 11, 2019 at 06:42:16PM +0100, Noralf Trønnes wrote:
>>>> This adds support for outputting kernel messages on panic().
>>>> A kernel message dumper is used to dump the log. The dumper iterates
>>>> over each DRM device and it's crtc's to find suitable framebuffers.
>>>>
>>>> All the other dumpers are run before this one except mtdoops.
>>>> Only atomic drivers are supported.
>>>>
>>>> Signed-off-by: Noralf Trønnes <noralf@xxxxxxxxxxx>
>>>
>>> Bunch of comments/ideas for you or Darwish below, whoever picks this up.
>>
>> Actually it would ne nice if Darwish could pick it up since he will do
>> it on i915 which will be useful to a much broader audience.
>> If not I'll respin when I'm done with the drm_fb_helper refactoring.
>>
> 
> Yup, I'll be more than happy to do this.. 

Thanks for doing that.

Noralf.

> while preserving all of
> Noralf's authorship and copyright notices of course.
> 
> I guess it can be:
> 
>   - Handle the comments posted by Daniel and others (I'll post
>     some questions too)
> 
>   - Add the necessary i915 specific bits
> 
>   - Test, post v3/v4/../vn. Rinse and repeat. Keep it local at
>     dri-devel until getting the necessary S-o-Bs.
> 
>   - Post to wider audience (some feedback from distribution folks
>     would also be nice, before posting to lkml)
> 
> More comments below..
> 
> [...]
> 
>>>> +
>>>> +static void drm_panic_kmsg_dump(struct kmsg_dumper *dumper,
>>>> +				enum kmsg_dump_reason reason)
>>>> +{
>>>> +	class_for_each_device(drm_class, NULL, dumper, drm_panic_dev_iter);
>>>
>>> class_for_each_device uses klist, which only uses an irqsave spinlock. I
>>> think that's good enough. Comment to that effect would be good e.g.
>>>
>>> 	/* based on klist, which uses only a spin_lock_irqsave, which we
>>> 	 * assume still works */
>>>
>>> If we aim for perfect this should be a trylock still, maybe using our own
>>> device list.
>>>
> 
> I definitely agree here.
> 
> The lock may already be locked either by a stopped CPU, or by the
> very same CPU we execute panic() on (e.g. NMI panic() on the
> printing CPU).
> 
> This is why it's very common for example in serial consoles, which
> are usually careful about re-entrance and panic contexts, to do:
> 
>   xxxxxx_console_write(...) {
> 	if (oops_in_progress)
> 		locked = spin_trylock_irqsave(&port->lock, flags);
> 	else
> 		spin_lock_irqsave(&port->lock, flags);
>   }
> 
> I'm quite positive we should do the same for panic drm drivers.
> John?
> 
>>>> +}
>>>> +
>>>> +static struct kmsg_dumper drm_panic_kmsg_dumper = {
>>>> +	.dump = drm_panic_kmsg_dump,
>>>> +	.max_reason = KMSG_DUMP_PANIC,
>>>> +};
>>>> +
>>>> +static ssize_t drm_panic_file_panic_write(struct file *file,
>>>> +					  const char __user *user_buf,
>>>> +					  size_t count, loff_t *ppos)
>>>> +{
>>>> +	unsigned long long val;
>>>> +	char buf[24];
>>>> +	size_t size;
>>>> +	ssize_t ret;
>>>> +
>>>> +	size = min(sizeof(buf) - 1, count);
>>>> +	if (copy_from_user(buf, user_buf, size))
>>>> +		return -EFAULT;
>>>> +
>>>> +	buf[size] = '\0';
>>>> +	ret = kstrtoull(buf, 0, &val);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	drm_panic_kmsg_dumper.max_reason = KMSG_DUMP_OOPS;
>>>> +	wmb();
>>>> +
>>>> +	/* Do a real test with: echo c > /proc/sysrq-trigger */
>>>> +
>>>> +	if (val == 0) {
>>>> +		pr_info("Test panic screen using kmsg_dump(OOPS)\n");
>>>> +		kmsg_dump(KMSG_DUMP_OOPS);
>>>> +	} else if (val == 1) {
>>>> +		char *null_pointer = NULL;
>>>> +
>>>> +		pr_info("Test panic screen using NULL pointer dereference\n");
>>>> +		*null_pointer = 1;
>>>> +	} else {
>>>> +		return -EINVAL;
>>>> +	}
>>>
>>> This isn't quite what I had in mind, since it still kills the kernel (like
>>> sysrq-trigger).
>>
>> If val == 0, it doesn't kill the kernel, it only dumps the kernel log.
>> And it doesn't taint the kernel either.
>>
>>> Instead what I had in mind is to recreate the worst
>>> possible panic context as much as feasible (disabling interrupts should be
>>> a good start, maybe we can even do an nmi callback), and then call our
>>> panic implementation. That way we can test the panic handler in a
>>> non-destructive way (i.e. aside from last dmesg lines printed to the
>>> screen nothing bad happens to the kernel: No real panic, no oops, no
>>> tainting).
>>
>> The interrupt case I can do, nmi I have no idea.
>>
> 
> I agree too. Disabling interrupts + CONFIG_DEBUG_ATOMIC_SLEEP
> would be a nice non-destructive test-case emulation.
> 
> thanks!
> 
> --
> darwi
> http://darwish.chasingpointers.com
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx