Re: [PATCH v3 1/2] mm: slub: Print the broken data before restoring slub.

Hyesoo Yu <hyesoo.yu@xxxxxxxxxxx> · Tue, 25 Feb 2025 10:09:09 +0900

On Mon, Feb 24, 2025 at 03:08:55PM +0100, Vlastimil Babka wrote:
> On 2/24/25 03:43, Hyesoo Yu wrote:
> > On Fri, Feb 21, 2025 at 05:16:01PM +0900, Harry Yoo wrote:
> >> On Thu, Feb 20, 2025 at 12:39:43PM +0900, Hyesoo Yu wrote:
> >> > Previously, the restore occured after printing the object in slub.
> >> > After commit 47d911b02cbe ("slab: make check_object() more consistent"),
> >> > the bytes are printed after the restore. This information about the bytes
> >> > before the restore is highly valuable for debugging purpose.
> >> > For instance, in a event of cache issue, it displays byte patterns
> >> > by breaking them down into 64-bytes units. Without this information,
> >> > we can only speculate on how it was broken. Hence the corrupted regions
> >> > should be printed prior to the restoration process. However if an object
> >> > breaks in multiple places, the same log may be output multiple times.
> >> > Therefore the slub log is reported only once to prevent redundant printing,
> >> > by sending a parameter indicating whether an error has occurred previously.
> >> > 
> >> > Changes in v3:
> >> > - Change the parameter type of check_bytes_and_report.
> >> > 
> >> > Changes in v2:
> >> > - Instead of using print_section every time on check_bytes_and_report,
> >> > just print it once for the entire slub object before the restore.
> >> > 
> >> > Signed-off-by: Hyesoo Yu <hyesoo.yu@xxxxxxxxxxx>
> >> > Change-Id: I73cf76c110eed62506643913517c957c05a29520
> >> > ---
> >> >  mm/slub.c | 29 ++++++++++++++---------------
> >> >  1 file changed, 14 insertions(+), 15 deletions(-)
> >> > 
> >> 
> >> > @@ -1212,11 +1213,14 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
> >> >  	if (slab_add_kunit_errors())
> >> >  		goto skip_bug_print;
> >> >  
> >> > -	slab_bug(s, "%s overwritten", what);
> >> >  	pr_err("0x%p-0x%p @offset=%tu. First byte 0x%x instead of 0x%x\n",
> >> >  					fault, end - 1, fault - addr,
> >> >  					fault[0], value);
> >> >  
> >> > +	scnprintf(buf, 100, "%s overwritten", what);
> >> > +	if (slab_obj_print)
> >> > +		object_err(s, slab, object, buf);
> >> 
> >> 
> >> Wait, I think it's better to keep printing "%s overwritten" regardless
> >> of slab_obj_print and only call __slab_err() if slab_obj_print == true
> >> as discussed here [1]? Becuase in case there are multiple errors,
> >> users should know.
> >> 
> >> [1] https://lore.kernel.org/all/2ff52c5e-4b6b-4b3d-9047-f00967315d3e@xxxxxxx
> >> 
> > 
> > Hi,
> > 
> > __slab_err() doesn't include print_trainer(). It needs object_err().
> 
> print_trailer() could be used directly?
>

object_err calls print_trailer, add_taint and WARN_ON that we need to call here.
I think direct calling is just redundant.

> > How about including the specific error name 'what' to pr_err ?
> > And then object_err would print "Object corrupt" at the beginning once
> > without buf like below.
> 
> Could also work.
> 
> > 	if (slab_obj_print)
> > 		object_err(s, slab, object, "Object corrupt");
> > 
> > 	pr_err("[%s] 0x%p-0x%p @offset=%tu. First byte 0x%x instead of 0x%x\n",
> >  	       what, fault, end - 1, fault - addr, fault[0], value);
> 
> Probably in opposite order so object_err doesn't panic_on_warn before the
> pr_err?
> 

Yes, I tested and found that logs are not printed when panic_on_warn is enabled.
we first call pr_err and then call object_err.

> > Thanks,
> > Regards.
> >> -- 
> >> Cheers,
> >> Harry
> >> 
> > 
> > 
> 
>