On Thu, Feb 06, 2025 at 12:35:22PM +0100, Vlastimil Babka wrote: > On 2/5/25 01:46, Hyesoo Yu wrote: > > If a slab object is corrupted or an error occurs in its internal > > value, continuing after restoration may cause other side effects. > > At this point, it is difficult to debug because the problem occurred > > in the past. It is better to use WARN() instead of pr_err to catch > > errors at the point of issue because WARN() could trigger panic for > > system debugging when panic_on_warn is enabled. WARN() should be > > called prior to fixing the value because when a panic is triggered by WARN(), > > it allows us to check corrupted data. > > > > Changes in v2: > > - Replace direct calling with BUG_ON with the use of WARN in slab_fix. > > > > Signed-off-by: Hyesoo Yu <hyesoo.yu@xxxxxxxxxxx> > > Hi and thanks for the patch, > > I wonder if it would be better not to change slab_fix() but rather > slab_err() and object_err(). It wouldn't then require to rearrange the fixup > code. Also I think some error reporting paths don't go through slab_fix() > and we still would like them to become a WARN too. > > Basically it would mean the last line in slab_err() would be a WARN and we'd > drop the dump_stack() as that's redundant. Same in object_err() (no > dump_stack() there). It would be a bit noisier as a result, but hopefully > acceptable. The slab specific debugging info would still be printed before > the WARN hits (and potentially results in a panic) so anyone investigating > the crash dump would have that information. > > Hm but I see some places print stuff after slab_err(). slab_pad_check() and > list_slab_objects(). We could create slab_err_start() and slab_err_end() for > those, and slab_err() would just call both at once. > Thank you so much for your review. That's a great suggestion. Following your suggestion, it seems possible to use WARN on all error reporting paths. For some places print stuff after slab_err, print them as follows, +static void __slab_err(struct slab *slab) +{ + print_slab_info(slab); + add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + + WARN_ON(1); +} + static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab, const char *fmt, ...) { vsnprintf(buf, sizeof(buf), fmt, args); va_end(args); slab_bug(s, "%s", buf); - print_slab_info(slab); - dump_stack(); - add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + __slab_err(slab); } @@ -1316,11 +1322,13 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab) while (end > fault && end[-1] == POISON_INUSE) end--; - slab_err(s, slab, "Padding overwritten. 0x%p-0x%p @offset=%tu", - fault, end - 1, fault - start); + slab_bug(s, "Padding overwritten. 0x%p-0x%p @offset=%tu", + fault, end - 1, fault - start); print_section(KERN_ERR, "Padding ", pad, remainder); restore_bytes(s, "slab padding", POISON_INUSE, fault, end); + + __slab_err(slab); } > > --- > > mm/slub.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index 1f50129dcfb3..ea956cb4b8be 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -1043,7 +1043,7 @@ static void slab_fix(struct kmem_cache *s, char *fmt, ...) > > va_start(args, fmt); > > vaf.fmt = fmt; > > vaf.va = &args; > > - pr_err("FIX %s: %pV\n", s->name, &vaf); > > + WARN(1, "FIX %s: %pV\n", s->name, &vaf); > > va_end(args); > > } > > > > @@ -1106,8 +1106,8 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab, > > if ((s->flags & SLAB_CONSISTENCY_CHECKS) && > > !check_valid_pointer(s, slab, nextfree) && freelist) { > > object_err(s, slab, *freelist, "Freechain corrupt"); > > - *freelist = NULL; > > slab_fix(s, "Isolate corrupted freechain"); > > + *freelist = NULL; > > return true; > > } > > > > @@ -1445,9 +1445,9 @@ static int on_freelist(struct kmem_cache *s, struct slab *slab, void *search) > > set_freepointer(s, object, NULL); > > } else { > > slab_err(s, slab, "Freepointer corrupt"); > > + slab_fix(s, "Freelist cleared"); > > slab->freelist = NULL; > > slab->inuse = slab->objects; > > - slab_fix(s, "Freelist cleared"); > > return 0; > > } > > break; > > @@ -1464,14 +1464,14 @@ static int on_freelist(struct kmem_cache *s, struct slab *slab, void *search) > > if (slab->objects != max_objects) { > > slab_err(s, slab, "Wrong number of objects. Found %d but should be %d", > > slab->objects, max_objects); > > - slab->objects = max_objects; > > slab_fix(s, "Number of objects adjusted"); > > + slab->objects = max_objects; > > } > > if (slab->inuse != slab->objects - nr) { > > slab_err(s, slab, "Wrong object count. Counter is %d but counted were %d", > > slab->inuse, slab->objects - nr); > > - slab->inuse = slab->objects - nr; > > slab_fix(s, "Object count adjusted"); > > + slab->inuse = slab->objects - nr; > > } > > return search == NULL; > > } > >