Re: [bisected] ext4 corruption on parisc since 6.12

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hmm, this is my config, also on an rp3440:

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# end of Timers subsystem

lindholm can confirm on their hardware/config. Maybe you can try that and see if you can reproduce? I will try your config as well.

On 2024-12-01 20:47, John David Anglin wrote:
I haven't seen any file system corruption on rp3440 with several weeks of running with clock events.  I just
started running 6.12.1 today though.

I have the following timer config:

# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

There was some concern about this change on systems where the CPU timers aren't synchronized.  what
systems do you see this on?

Dave

On 2024-12-01 7:26 p.m., matoro wrote:
Hi Helge, when booting 6.12 here myself and another user (CC'd) both observed our ext4 filesystems to be immediately corrupted in the same manner.

Every file that is read or written will have its access/modify times set to 2446-05-10 18:38:55.0000, which is the maximum ext4 timestamp.  The 32-bit userspace doesn't seem to be able to handle this at all, as every further stat() call will error with "Value too large for defined data type".  Unfortunately, simply rolling back to kernel 6.11 is insufficient to recover, as the filesystem corruption is persistent, and the errors come from userspace attempting to read the modified files.  I was able to recover with a command like:  find / -newermt 2446-01-01 -o -newerct 2446-01-01 -o -newerat 2446-01-01 | xargs touch -h

Luckily, lindholm was able to bisect and identified as the culprit commit:  b5ff52be891347f8847872c49d7a5c2fa29400a7 ("parisc: Convert to generic clockevents").  Some other comments from the discussion:

17:20:37 <awilfox> would be curious if keeping that patch + CONFIG_SMP=n fixes it
17:20:44 <awilfox> this doesn't look necessarily correct on MP machines
17:23:56 <awilfox> time_keeper_id is now unused; the old code specifically marked the clocksource as unstable on MP machines despite having per_cpu before 17:24:11 <awilfox> and now it seems to imply CLOCK_EVT_FEAT_PERCPU is enough to work around it
17:24:13 <awilfox> maybe it isn't

Thanks!




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux