Re: [PATCH 00/32] kasan: switch tag-based modes to stack ring from per-object metadata

Andrey Konovalov <andreyknvl@xxxxxxxxx> · Tue, 19 Jul 2022 00:41:08 +0200

On Fri, Jun 17, 2022 at 11:32 AM Marco Elver <elver@xxxxxxxxxx> wrote:
>
> > The disadvantage:
> >
> > - If the affected object was allocated/freed long before the bug happened
> >   and the stack trace events were purged from the stack ring, the report
> >   will have no stack traces.
>
> Do you have statistics on how how likely this is? Maybe through
> identifying what the average lifetime of an entry in the stack ring is?
>
> How bad is this for very long lived objects (e.g. pagecache)?

I ran a test on Pixel 6: the stack ring of size (32 << 10) gets fully
rewritten every ~2.7 seconds during boot. Any buggy object that is
allocated/freed and then accessed with a bigger time span will not
have stack traces.

This can be dealt with by increasing the stack ring size, but this
comes down to how much memory one is willing to allocate for the stack
ring. If we decide to use sampling (saving stack traces only for every
Nth object), that will affect this too.

But any object that is allocated once during boot will be purged out
of the stack ring sooner or later. One could argue that such objects
are usually allocated at a single know place, so have a stack trace
won't considerably improve the report.

I would say that we need to deploy some solution, study the reports,
and adjust the implementation based on that.

> > Discussion
> > ==========
> >
> > The current implementation of the stack ring uses a single ring buffer for
> > the whole kernel. This might lead to contention due to atomic accesses to
> > the ring buffer index on multicore systems.
> >
> > It is unclear to me whether the performance impact from this contention
> > is significant compared to the slowdown introduced by collecting stack
> > traces.
>
> I agree, but once stack trace collection becomes faster (per your future
> plans below), this might need to be revisited.

Ack.

Thanks!