On Thu, Feb 15, 2024 at 4:25 PM Peter Collingbourne <pcc@xxxxxxxxxx> wrote: > > On Thu, Feb 15, 2024 at 3:37 PM Andrey Konovalov <andreyknvl@xxxxxxxxx> wrote: > > > > On Thu, Feb 15, 2024 at 10:58 PM Oscar Salvador <osalvador@xxxxxxx> wrote: > > > > > > The very first entry of stack_record gets a handle of 0, but this is wrong > > > because stackdepot treats a 0-handle as a non-valid one. > > > E.g: See the check in stack_depot_fetch() > > > > > > Fix this by adding and offset of 1. > > > > > > This bug has been lurking since the very beginning of stackdepot, > > > but no one really cared as it seems. > > > Because of that I am not adding a Fixes tag. > > > > > > Co-developed-by: Marco Elver <elver@xxxxxxxxxx> > > > Signed-off-by: Marco Elver <elver@xxxxxxxxxx> > > > Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> > > > Acked-by: Vlastimil Babka <vbabka@xxxxxxx> > > > --- > > > lib/stackdepot.c | 16 +++++++++------- > > > 1 file changed, 9 insertions(+), 7 deletions(-) > > > > > > diff --git a/lib/stackdepot.c b/lib/stackdepot.c > > > index 4a7055a63d9f..c043a4186bc5 100644 > > > --- a/lib/stackdepot.c > > > +++ b/lib/stackdepot.c > > > @@ -45,15 +45,16 @@ > > > #define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \ > > > STACK_DEPOT_EXTRA_BITS) > > > #define DEPOT_POOLS_CAP 8192 > > > +/* The pool_index is offset by 1 so the first record does not have a 0 handle. */ > > > #define DEPOT_MAX_POOLS \ > > > - (((1LL << (DEPOT_POOL_INDEX_BITS)) < DEPOT_POOLS_CAP) ? \ > > > - (1LL << (DEPOT_POOL_INDEX_BITS)) : DEPOT_POOLS_CAP) > > > + (((1LL << (DEPOT_POOL_INDEX_BITS)) - 1 < DEPOT_POOLS_CAP) ? \ > > > + (1LL << (DEPOT_POOL_INDEX_BITS)) - 1 : DEPOT_POOLS_CAP) > > > > > > /* Compact structure that stores a reference to a stack. */ > > > union handle_parts { > > > depot_stack_handle_t handle; > > > struct { > > > - u32 pool_index : DEPOT_POOL_INDEX_BITS; > > > + u32 pool_index : DEPOT_POOL_INDEX_BITS; /* pool_index is offset by 1 */ > > Can we rename this, say to pool_index_plus_1? This will make the code > a bit clearer, as well as make it possible for debugging tools such as > drgn [1] to be able to tell when the off-by-one was introduced and > adapt accordingly. > > Peter > > [1] https://github.com/osandov/drgn/pull/376 Unfortunately this message was not acted upon, and it looks like akpm picked up the patch and it made its way into Linus's tree. So I sent a followup to fix this here: https://lore.kernel.org/all/20240402001500.53533-1-pcc@xxxxxxxxxx/ Peter