On Fri, Feb 07, 2025 at 10:08:20AM +0100, Vlastimil Babka wrote: > On 2/7/25 04:28, Hyesoo Yu wrote: > > On Thu, Feb 06, 2025 at 12:35:22PM +0100, Vlastimil Babka wrote: > >> On 2/5/25 01:46, Hyesoo Yu wrote: > >> > If a slab object is corrupted or an error occurs in its internal > >> > value, continuing after restoration may cause other side effects. > >> > At this point, it is difficult to debug because the problem occurred > >> > in the past. It is better to use WARN() instead of pr_err to catch > >> > errors at the point of issue because WARN() could trigger panic for > >> > system debugging when panic_on_warn is enabled. WARN() should be > >> > called prior to fixing the value because when a panic is triggered by WARN(), > >> > it allows us to check corrupted data. > >> > > >> > Changes in v2: > >> > - Replace direct calling with BUG_ON with the use of WARN in slab_fix. > >> > > >> > Signed-off-by: Hyesoo Yu <hyesoo.yu@xxxxxxxxxxx> > >> > >> Hi and thanks for the patch, > >> > >> I wonder if it would be better not to change slab_fix() but rather > >> slab_err() and object_err(). It wouldn't then require to rearrange the fixup > >> code. Also I think some error reporting paths don't go through slab_fix() > >> and we still would like them to become a WARN too. > >> > >> Basically it would mean the last line in slab_err() would be a WARN and we'd > >> drop the dump_stack() as that's redundant. Same in object_err() (no > >> dump_stack() there). It would be a bit noisier as a result, but hopefully > >> acceptable. The slab specific debugging info would still be printed before > >> the WARN hits (and potentially results in a panic) so anyone investigating > >> the crash dump would have that information. > >> > >> Hm but I see some places print stuff after slab_err(). slab_pad_check() and > >> list_slab_objects(). We could create slab_err_start() and slab_err_end() for > >> those, and slab_err() would just call both at once. > >> > > > > Thank you so much for your review. > > > > That's a great suggestion. Following your suggestion, it seems possible to use > > WARN on all error reporting paths. For some places print stuff after slab_err, > > print them as follows, > > > > +static void __slab_err(struct slab *slab) > > +{ > > + print_slab_info(slab); > > + add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); > > + > > + WARN_ON(1); > > +} > > + > > static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab, > > const char *fmt, ...) > > { > > vsnprintf(buf, sizeof(buf), fmt, args); > > va_end(args); > > slab_bug(s, "%s", buf); > > - print_slab_info(slab); > > - dump_stack(); > > - add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); > > + __slab_err(slab); > > } > > > > @@ -1316,11 +1322,13 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab) > > while (end > fault && end[-1] == POISON_INUSE) > > end--; > > > > - slab_err(s, slab, "Padding overwritten. 0x%p-0x%p @offset=%tu", > > - fault, end - 1, fault - start); > > + slab_bug(s, "Padding overwritten. 0x%p-0x%p @offset=%tu", > > + fault, end - 1, fault - start); > > print_section(KERN_ERR, "Padding ", pad, remainder); > > > > restore_bytes(s, "slab padding", POISON_INUSE, fault, end); > > + > > + __slab_err(slab); > > Yeah but move that above restore_bytes()? > Yes, I will. > BTW I think we could also do this? > > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1162,7 +1162,7 @@ void skip_orig_size_check(struct kmem_cache *s, const void *object) > set_orig_size(s, (void *)object, s->object_size); > } > > -static void slab_bug(struct kmem_cache *s, char *fmt, ...) > +static void slab_bug(struct kmem_cache *s, const char *fmt, ...) > { > struct va_format vaf; > va_list args; > @@ -1263,15 +1263,11 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab, > const char *fmt, ...) > { > va_list args; > - char buf[100]; > > if (slab_add_kunit_errors()) > return; > > - va_start(args, fmt); > - vsnprintf(buf, sizeof(buf), fmt, args); > - va_end(args); > - slab_bug(s, "%s", buf); > + slab_bug(s, fmt, args); > > I guess the args is not initialized. Do you want it to be modified like this ? Thanks, Regards. -static void slab_bug(struct kmem_cache *s, char *fmt, ...) +static void slab_bug(struct kmem_cache *s, const char *fmt, ...) { struct va_format vaf; va_list args; @@ -1137,15 +1137,14 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab, const char *fmt, ...) { va_list args; - char buf[100]; if (slab_add_kunit_errors()) return; va_start(args, fmt); - vsnprintf(buf, sizeof(buf), fmt, args); + slab_bug(s, fmt, args); va_end(args); - slab_bug(s, "%s", buf); > > } > > > >> > --- > >> > mm/slub.c | 10 +++++----- > >> > 1 file changed, 5 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/mm/slub.c b/mm/slub.c > >> > index 1f50129dcfb3..ea956cb4b8be 100644 > >> > --- a/mm/slub.c > >> > +++ b/mm/slub.c > >> > @@ -1043,7 +1043,7 @@ static void slab_fix(struct kmem_cache *s, char *fmt, ...) > >> > va_start(args, fmt); > >> > vaf.fmt = fmt; > >> > vaf.va = &args; > >> > - pr_err("FIX %s: %pV\n", s->name, &vaf); > >> > + WARN(1, "FIX %s: %pV\n", s->name, &vaf); > >> > va_end(args); > >> > } > >> > > >> > @@ -1106,8 +1106,8 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab, > >> > if ((s->flags & SLAB_CONSISTENCY_CHECKS) && > >> > !check_valid_pointer(s, slab, nextfree) && freelist) { > >> > object_err(s, slab, *freelist, "Freechain corrupt"); > >> > - *freelist = NULL; > >> > slab_fix(s, "Isolate corrupted freechain"); > >> > + *freelist = NULL; > >> > return true; > >> > } > >> > > >> > @@ -1445,9 +1445,9 @@ static int on_freelist(struct kmem_cache *s, struct slab *slab, void *search) > >> > set_freepointer(s, object, NULL); > >> > } else { > >> > slab_err(s, slab, "Freepointer corrupt"); > >> > + slab_fix(s, "Freelist cleared"); > >> > slab->freelist = NULL; > >> > slab->inuse = slab->objects; > >> > - slab_fix(s, "Freelist cleared"); > >> > return 0; > >> > } > >> > break; > >> > @@ -1464,14 +1464,14 @@ static int on_freelist(struct kmem_cache *s, struct slab *slab, void *search) > >> > if (slab->objects != max_objects) { > >> > slab_err(s, slab, "Wrong number of objects. Found %d but should be %d", > >> > slab->objects, max_objects); > >> > - slab->objects = max_objects; > >> > slab_fix(s, "Number of objects adjusted"); > >> > + slab->objects = max_objects; > >> > } > >> > if (slab->inuse != slab->objects - nr) { > >> > slab_err(s, slab, "Wrong object count. Counter is %d but counted were %d", > >> > slab->inuse, slab->objects - nr); > >> > - slab->inuse = slab->objects - nr; > >> > slab_fix(s, "Object count adjusted"); > >> > + slab->inuse = slab->objects - nr; > >> > } > >> > return search == NULL; > >> > } > >> > >> > > > > > >