On Thu, Mar 14, 2019 at 05:45:32AM +0100, Ahmed S. Darwish wrote: > Hi, > > On Wed, Mar 13, 2019 at 09:35:05AM +0100, Daniel Vetter wrote: > > On Tue, Mar 12, 2019 at 11:13:03PM +0100, Ahmed S. Darwish wrote: > > > On Mon, Mar 11, 2019 at 11:33:15PM +0100, Noralf Trønnes wrote: > > > > Den 11.03.2019 20.23, skrev Daniel Vetter: > [……] > > > > > > > > > > class_for_each_device uses klist, which only uses an irqsave spinlock. I > > > > > think that's good enough. Comment to that effect would be good e.g. > > > > > > > > > > /* based on klist, which uses only a spin_lock_irqsave, which we > > > > > * assume still works */ > > > > > > > > > > If we aim for perfect this should be a trylock still, maybe using our own > > > > > device list. > > > > > > > > > > > I definitely agree here. > > > > > > The lock may already be locked either by a stopped CPU, or by the > > > very same CPU we execute panic() on (e.g. NMI panic() on the > > > printing CPU). > > > > > > This is why it's very common for example in serial consoles, which > > > are usually careful about re-entrance and panic contexts, to do: > > > > > > xxxxxx_console_write(...) { > > > if (oops_in_progress) > > > locked = spin_trylock_irqsave(&port->lock, flags); > > > else > > > spin_lock_irqsave(&port->lock, flags); > > > } > > > > > > I'm quite positive we should do the same for panic drm drivers. > > > > Yeah Ideally all the locking in the drm path would be trylock only. > > > > I wonder whether lockdep could help us validate this, with some "don't > > allow anything except trylocks in this context". It's easy to audit the > > core code with review, but drivers are much tougher. And often end up with > > really deep callchains to get at the backing buffers. > [……] > > > > > Instead what I had in mind is to recreate the worst > > > > > possible panic context as much as feasible (disabling interrupts should be > > > > > a good start, maybe we can even do an nmi callback), and then call our > > > > > panic implementation. That way we can test the panic handler in a > > > > > non-destructive way (i.e. aside from last dmesg lines printed to the > > > > > screen nothing bad happens to the kernel: No real panic, no oops, no > > > > > tainting). > > > > > > > > The interrupt case I can do, nmi I have no idea. > > > > > > > > > > I agree too. Disabling interrupts + CONFIG_DEBUG_ATOMIC_SLEEP > > > would be a nice non-destructive test-case emulation. > > > > See above, if we can somehow emulate "all locks are held, only allow > > trylock" with lockdep that would be great too. > > > > Hmmmm, to prototype things faster, I'll do it as a hook: > > spin_lock(spinlock_t *lock) > { > might_spin_forever_under_panic_path(); > ... > } > > Just like how mutexes and DEBUG_ATOMIC_SLEEP do it today: > > void mutex_lock(struct mutex *lock) > { > might_sleep(); > ... > } > > Hopefully the locking maintainers can accept something like this. > If not, let's play with lockdep (which is a bit complex). > > ===> Thanks to Sebastian for suggesting this! Ah yes this is a very nice idea. We only need to make sure that we don't enable any of this for a real oops. Or we just scroll away the real oops with tons of warnings. -Daniel > > > Plus nmi context, once/if that somehow becomes relevant. > > > > The thing that killed the old drm panic handling code definitely was that > > we flat out couldnt' test it except with real oopses. And that's just > > whack-a-mole and bug reporter frustration if you first have a few patch > > iterations around "oh sry, it's still some oops in the panic handler, not > > the first one that we're seeing" :-/ > > > > Oh, that's a critical point: > > full data > some data > no data > invalid/useless data > > First impressions for the feature will definitely count. Hopefully > the hooks above and some heavy DRM CI testing __beforehand__ can > improve things before publishing to a wider audience.. Yup, let's get this right! -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx