On Thu, Oct 27, 2022 at 10:24:51AM +0800, Yujie Liu wrote: > On 10/27/2022 10:09, Yujie Liu wrote: > > On Wed, Oct 26, 2022 at 02:06:17PM +0200, Peter Zijlstra wrote: > > > On Wed, Oct 26, 2022 at 11:51:14AM +0200, Peter Zijlstra wrote: > > > > On Wed, Oct 26, 2022 at 05:10:28PM +0800, kernel test robot wrote: > > > > > Hi Peter, > > > > > > > > > > For below patch, we couldn't find any connection between the code > > > > > change of kallsyms and xfstests testcase, but we got very stable test > > > > > results. We tested commit f1389181622a and its parent commit > > > > > 7825451fa4dc for 12 runs each, parent was 100% good while this commit > > > > > was 100% bad, so we still send out this report FYI though we don't have > > > > > clear clue of the root cause. Please check the details below if > > > > > interested in further investigation. Thanks. > > > > > > > > *groan* I'll go have a poke. > > > > > > > > Also; I've had these patches in my queue.git for over a month now and > > > > this is the first report, how comes? > > > > Hi Peter, > > > > Sorry about late report. We did catch this problem on > > call-depth-tracking branch of queue.git in Semtember, but we used to > > think it may be a false alarm since we cannot connect the code change > > with xfstests, so didn't report at that time. After merged to x86 tip, > > we did more tests and reviewed this problem with more developers, and > > sent out the report finally. > > > > > dmesg starts at 42 seconds, you don't happen to have a complete one? > > > > Sorry for the incomplete dmesg. Please check if attached kmsg.xz helps? > > The time may vary a little since it is captured from another run. > > Thanks. > > > > Not sure if this has anything to do with xfs or testcases, so add > > xfs folks to consult. > > > > Hi xfs folks, > > > > Could you please help check the original report at > > https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@xxxxxxxxx > > to see if can find any clue for the problem? Thanks. > > Sorry, wrong link. Should be: > > https://lore.kernel.org/all/202210241614.2ae4c1f5-yujie.liu@xxxxxxxxx/ xfs/439 corrupts the log and checks that log recovery will notice the corruption and abort the mount. Excerpted dmesg from the link above: [ 75.708690][ T3668] ------------[ cut here ]------------ [ 75.714041][ T3668] WARNING: CPU: 1 PID: 3668 at fs/xfs/xfs_message.c:104 xfs_buf_alert_ratelimited.cold-0x7/0x45 [xfs] [ 75.779828][ T3668] CPU: 1 PID: 3668 Comm: mount Not tainted 6.1.0-rc1-00053-gf1389181622a #1 [ 75.788389][ T3668] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013 [ 75.796864][ T3668] RIP: 0010:xfs_buf_alert_ratelimited.cold-0x7/0x45 [xfs] [ 75.898077][ T3668] xlog_find_verify_log_record (fs/xfs/xfs_log_recover.c:431) xfs [ 75.910856][ T3668] xlog_find_head (fs/xfs/xfs_log_recover.c:705) xfs [ 75.925847][ T3668] xlog_find_tail (fs/xfs/xfs_log_recover.c:1256) xfs [ 75.951249][ T3668] xlog_recover (fs/xfs/xfs_log_recover.c:3355) xfs [ 75.961324][ T3668] xfs_log_mount (fs/xfs/xfs_log.c:739) xfs [ 75.966486][ T3668] xfs_mountfs (fs/xfs/xfs_mount.c:805 (discriminator 4)) xfs [ 75.998070][ T3668] xfs_fs_fill_super (fs/xfs/xfs_super.c:1666) xfs So yes, this is xlog_find_verify_log_record behaving as intended: if (i < start_blk) { /* valid log record not found */ xfs_warn(log->l_mp, "Log inconsistent (didn't find previous header)"); ASSERT(0); <------------------ here error = -EFSCORRUPTED; goto out; } However, I noticed that the callsite in the WARNING: message has changed from the usual 'asswarn' (which is the caller of WARN_ON) to 'xfs_buf_alert_ratelimited', which seems totally wrong since XFS log recovery doesn't touch xfs_buf objects at all. The reason this turned into a fstests regression is that xfs/439 uses _filter_assert_dmesg to check for warnings in dmesg that are not the result of an ASSERT failing (common/rc in fstests): # We generate assert related WARNINGs on purpose and make sure # test doesn't fail because of these warnings. This is a helper # for _check_dmesg() to filter out them. _filter_assert_dmesg() { local warn1="WARNING:.*fs/xfs/xfs_message\.c:.*asswarn.*" local warn2="WARNING:.*fs/xfs/xfs_message\.c:.*assfail.*" sed -e "s#$warn1#Intentional warnings in asswarn#" \ -e "s#$warn2#Intentional warnings in assfail#" } As for why that commit reports the wrong caller ... I don't know. The function name and line number are correct. --D > > -- > Best Regards, > Yujie