On Mon, Apr 04, 2022 at 05:18:16PM +0200, Marco Elver wrote: > On Mon, 4 Apr 2022 at 16:20, Vlastimil Babka <vbabka@xxxxxxx> wrote: > > > > On 4/4/22 10:10, Marco Elver wrote: > > > On Mon, Apr 04, 2022 at 12:05PM +0900, Hyeonggon Yoo wrote: > > > (Maybe CONFIG_KCSAN_STRICT=y is going to yield something? I still doubt > > > it thought, this bug is related to corrupted stackdepot handle > > > somewhere...) > > > > > >> I noticed that it is not reproduced when KASAN=y and KFENCE=n (reproduced 0 of 181). > > >> and it was reproduced 56 of 196 when KASAN=n and KFENCE=y > > >> > > >> maybe this issue is related to kfence? > > > > Hmm kfence seems to be a good lead. If I understand kfence_guarded_alloc() > > correctly, it tries to set up something that really looks like a normal slab > > page? Especially the part with comment /* Set required slab fields. */ > > But it doesn't seem to cover the debugging parts that SLUB sets up with > > alloc_debug_processing(). This includes alloc stack saving, thus, after > > commit 555b8c8cb3, a stackdepot handle setting. It probably normally doesn't > > matter as is_kfence_address() redirects processing of kfence-allocated > > objects so we don't hit any slub code that expects the debugging parts to be > > properly initialized. > > > > But here we are in mem_dump_obj() -> kmem_dump_obj() -> kmem_obj_info(). > > Because kmem_valid_obj() returned true, fooled by folio_test_slab() > > returning true because of the /* Set required slab fields. */ code. > > Yet the illusion is not perfect and we read garbage instead of a valid > > stackdepot handle. > > > > IMHO we should e.g. add the appropriate is_kfence_address() test into > > kmem_valid_obj(), to exclude kfence-allocated objects? Sounds much simpler > > than trying to extend the illusion further to make kmem_dump_obj() work? > > Instead kfence could add its own specific handler to mem_dump_obj() to print > > its debugging data? > > I think this explanation makes sense! Indeed, KFENCE already records > allocation stacks internally anyway, so it should be straightforward > to convince it to just print that. > Thank you both! Yeah the explanation makes sense... thats why KASAN/KCSAN couldn't yield anything -- it was not overwritten. I'm writing a fix and will test if the bug disappears. This may take few days. Thanks! Hyeonggon > Thanks, > -- Marco