Re: ext4 damage suspected in between 5.15.167 - 5.15.170

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ted,

On Thu, Dec 12, 2024 at 09:31:05PM +0300, Nikolai Zhubr wrote:
This is to report that after jumping from generic kernel 5.15.167 to
5.15.170 I apparently observe ext4 damage.

Hi Nick,

In general this is not something that upstream kernel developers will
pay a lot of attention to try to root cause.  If you can come up with

Thanks for a quick and detailed reply. That's really appreciated. I need to clarify. I'm not a hardcore kernel developer at all, I just touch it a little bit occasionally, for random reasons. Debugging the situation thoroughly so as to find and prove the cause is far beyond my capability and also not exactly my personal or professional interest. I also don't need any sort of support (i.e. as a client) - I've already repaired and validated/restored from backups almost everything now, and I can just stick at 5.15.167 for basically as long as I like.

On the other hand, having buggy kernels (to the point of ext4 fs corruption) published as suitable for wide general use is not a good thing in my book, therefore I believe in the case of reasonable suspects I must at least raise a warning about it, and if I can somehow contribute to tracking the problem I'll do what I'm able to.

Not going to argue, but it'd seem if 5.15 is totally out of interest already, why keep patching it? And as long as it keeps receiving patches, supposedly they are backported and applied to stabilize, not damage it? Ok, nevermind :-)

People will also pay more attention if you give more detail in your
message.  Not just some vague "ext4 damage" (where 99% of time, these
sorts of things happen due to hardware-induced corruption), but the
exact message when mount failed.

Yes. That is why I spent 2 days for solely testing hardware, booting from separate media, stressing everything, and making plenty of copies. As I mentioned in my initial post, this had revealed no hardware issues. And I'm enjoying md raid-1 since around 2003 already (Not on this device though). I can post all my "smart" values as is, but I can assure they are perfectly fine for both raid-1 members. I encounter faulty hdds elsewhere routinely so its not something unseen too.

#smartctl -a /dev/nvme0n1 | grep Spare
Available Spare:                    100%
Available Spare Threshold:          10%

#smartctl -a /dev/sda | grep Sector
Sector Sizes:     512 bytes logical, 4096 bytes physical
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0

I have a copy of the entire ext4 partition taken immediately as mount first failed, it is ~800Gb and may contain some sensitive data so I cannot just hand it to someone else or publish for examination. But I can now easily do a replay of mount failure and fsck processing as many times as needed. For now, it seems file/dir bodies had not been damaged, just some system areas had. I've not encountered any file which would give wrong checksum or otherwise appeared definitely damaged, with overall like 95% verified and definitely fine, 5% hard to reliably verify but those are less important files.

Also helpful when reporting ext4 issues, it's helpful to include
information about the file system configuration using "dumpe2fs -h

This is a dump run on a standalone copy taken before repair (after successful raid re-check):

#dumpe2fs -h /dev/sdb1
Filesystem volume name:   DATA
Last mounted on:          /opt
Filesystem UUID:          ea823c6c-500f-4bf0-a4a7-a872ed740af3
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              51634176
Block count:              206513920
Reserved block count:     10325696
Overhead clusters:        3292742
Free blocks:              48135978
Free inodes:              50216050
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Tue Jul  9 01:51:16 2024
Last mount time:          Mon Dec  9 10:08:27 2024
Last write time:          Tue Dec 10 04:08:17 2024
Mount count:              273
Maximum mount count:      -1
Last checked:             Tue Jul  9 01:51:16 2024
Check interval:           0 (<none>)
Lifetime writes:          913 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      60bfa28b-cdd2-4ba6-8261-87961db4ecea
Journal backup:           inode blocks
FS Error count:           293
First error time:         Tue Dec 10 06:17:23 2024
First error function:     ext4_lookup
First error line #:       1437
First error inode #:      20709377
Last error time:          Tue Dec 10 21:12:30 2024
Last error function:      ext4_lookup
Last error line #:        1437
Last error inode #:       20709377
Journal features:         journal_incompat_revoke journal_64bit
Total journal size:       128M
Total journal blocks:     32768
Max transaction length:   32768
Fast commit length:       0
Journal sequence:         0x00064c6e
Journal start:            0

/dev/XXX".  Extracting kernel log messages that include the string
"EXT4-fs", via commands like "sudo dmesg | grep EXT4-fs", or "sudo
journalctl | grep EXT4-fs", or "grep EXT4-fs /var/log/messages" are
also helpful, as is getting a report from fsck via a command like

#grep EXT4-fs messages-20241212 | grep md126
2024-12-06T11:53:09.471317+03:00 lenovo-zh kernel: [ 7.649474][ T1124] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-06T11:53:09.471351+03:00 lenovo-zh kernel: [ 7.899321][ T1124] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none. 2024-12-07T12:03:18.518047+03:00 lenovo-zh kernel: [ 7.633150][ T1106] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-07T12:03:18.518054+03:00 lenovo-zh kernel: [ 7.951716][ T1106] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none. 2024-12-08T12:41:33.686145+03:00 lenovo-zh kernel: [ 7.588405][ T1118] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-08T12:41:33.686148+03:00 lenovo-zh kernel: [ 7.679963][ T1118] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none.
(* normal boot failed and subsequently fsck was run on real data here *)
2024-12-10T18:21:40.356656+03:00 lenovo-zh kernel: [ 483.522025][ T1740] EXT4-fs (md126): failed to initialize system zone (-117) 2024-12-10T18:21:40.356685+03:00 lenovo-zh kernel: [ 483.522050][ T1740] EXT4-fs (md126): mount failed 2024-12-11T02:00:18.382301+03:00 lenovo-zh kernel: [ 490.551080][ T1809] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none. 2024-12-11T12:00:53.249626+03:00 lenovo-zh kernel: [ 7.550823][ T1056] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-11T12:00:53.249629+03:00 lenovo-zh kernel: [ 7.662317][ T1056] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none.

#grep md126 messages-20241212
2024-12-07T12:03:18.518038+03:00 lenovo-zh kernel: [ 7.154448][ T992] md126: detected capacity change from 0 to 1652111360 2024-12-07T12:03:18.518047+03:00 lenovo-zh kernel: [ 7.633150][ T1106] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-07T12:03:18.518054+03:00 lenovo-zh kernel: [ 7.951716][ T1106] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none. 2024-12-08T12:41:33.685280+03:00 lenovo-zh systemd[1]: Started Timer to wait for more drives before activating degraded array md126.. 2024-12-08T12:41:33.685325+03:00 lenovo-zh systemd[1]: mdadm-last-resort@md126.timer: Deactivated successfully. 2024-12-08T12:41:33.685327+03:00 lenovo-zh systemd[1]: Stopped Timer to wait for more drives before activating degraded array md126.. 2024-12-08T12:41:33.686136+03:00 lenovo-zh kernel: [ 7.346744][ T1107] md/raid1:md126: active with 2 out of 2 mirrors 2024-12-08T12:41:33.686137+03:00 lenovo-zh kernel: [ 7.357218][ T1107] md126: detected capacity change from 0 to 1652111360 2024-12-08T12:41:33.686145+03:00 lenovo-zh kernel: [ 7.588405][ T1118] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-08T12:41:33.686148+03:00 lenovo-zh kernel: [ 7.679963][ T1118] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none.
(* on 2024-12-09 system refused to boot and no normal log was written *)
2024-12-10T18:13:44.862091+03:00 lenovo-zh systemd[1]: Started Timer to wait for more drives before activating degraded array md126.. 2024-12-10T18:13:45.164589+03:00 lenovo-zh kernel: [ 8.332616][ T1248] md/raid1:md126: active with 2 out of 2 mirrors 2024-12-10T18:13:45.196580+03:00 lenovo-zh kernel: [ 8.363066][ T1248] md126: detected capacity change from 0 to 1652111360 2024-12-10T18:13:45.469396+03:00 lenovo-zh systemd[1]: mdadm-last-resort@md126.timer: Deactivated successfully. 2024-12-10T18:13:45.469584+03:00 lenovo-zh systemd[1]: Stopped Timer to wait for more drives before activating degraded array md126.. 2024-12-10T18:18:51.652575+03:00 lenovo-zh kernel: [ 314.821429][ T1657] md: data-check of RAID array md126 2024-12-10T18:21:40.356656+03:00 lenovo-zh kernel: [ 483.522025][ T1740] EXT4-fs (md126): failed to initialize system zone (-117) 2024-12-10T18:21:40.356685+03:00 lenovo-zh kernel: [ 483.522050][ T1740] EXT4-fs (md126): mount failed 2024-12-10T20:07:29.116652+03:00 lenovo-zh kernel: [ 6832.284366][ T1657] md: md126: data-check done.
(fsck was run on real data here)
2024-12-11T01:52:15.839052+03:00 lenovo-zh systemd[1]: Started Timer to wait for more drives before activating degraded array md126.. 2024-12-11T01:52:15.840396+03:00 lenovo-zh kernel: [ 7.832271][ T1170] md/raid1:md126: active with 2 out of 2 mirrors 2024-12-11T01:52:15.840397+03:00 lenovo-zh kernel: [ 7.845385][ T1170] md126: detected capacity change from 0 to 1652111360 2024-12-11T01:52:16.255454+03:00 lenovo-zh systemd[1]: mdadm-last-resort@md126.timer: Deactivated successfully. 2024-12-11T01:52:16.255573+03:00 lenovo-zh systemd[1]: Stopped Timer to wait for more drives before activating degraded array md126.. 2024-12-11T02:00:18.382301+03:00 lenovo-zh kernel: [ 490.551080][ T1809] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

"fsck.ext4 -fn /dev/XXX >& /tmp/fsck.out"

This is a fsck run on a standalone copy taken before repair (after successful raid re-check):

#fsck.ext4 -fn /dev/sdb1
ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap
fsck.ext4: Group descriptors look bad... trying backup blocks...
Pass 1: Checking inodes, blocks, and sizes
Inode 9185447 extent tree (at level 1) could be narrower.  Optimize? no
Inode 9189969 extent tree (at level 1) could be narrower.  Optimize? no
Inode 22054610 extent tree (at level 1) could be shorter.  Optimize? no
Inode 22959998 extent tree (at level 1) could be shorter.  Optimize? no
Inode 23351116 extent tree (at level 1) could be shorter.  Optimize? no
Inode 23354700 extent tree (at level 1) could be shorter.  Optimize? no
Inode 23363083 extent tree (at level 1) could be shorter.  Optimize? no
Inode 25197205 extent tree (at level 1) could be narrower.  Optimize? no
Inode 25197271 extent tree (at level 1) could be narrower.  Optimize? no
Inode 47710225 extent tree (at level 1) could be narrower.  Optimize? no
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (23414, counted=22437).
Fix? no
Free blocks count wrong for group #1 (31644, counted=7).
Fix? no
Free blocks count wrong for group #2 (32768, counted=0).
Fix? no
Free blocks count wrong for group #3 (31644, counted=4).
Fix? no

[repeated tons of times]

Free inodes count wrong for group #4895 (8192, counted=8044).
Fix? no
Directories count wrong for group #4895 (0, counted=148).
Fix? no
Free inodes count wrong for group #4896 (8192, counted=8114).
Fix? no
Directories count wrong for group #4896 (0, counted=13).
Fix? no
Free inodes count wrong for group #5824 (8192, counted=8008).
Fix? no
Directories count wrong for group #5824 (0, counted=31).
Fix? no
Free inodes count wrong (51634165, counted=50157635).
Fix? no
DATA: ********** WARNING: Filesystem still has errors **********
DATA: 11/51634176 files (73845.5% non-contiguous), 3292748/206513920 blocks

And because there are apparently 0 commits to ext4 in 5.15 since
5.15.168 at the moment, I thought I'd report.

Did you check for any changes to the md/dm code, or the block layer?

No. Generally, it could be just anything, therefore I see no point even starting without good background knowledge. That is why I'm trying to draw attention of those who are more aware instead. :-)

Also, if you checked for I/O errors in the system logs, or run
"smartctl" on the block devices, please say so.  (And if there are
indications of I/O errors or storage device issues, please do
immediate backups and make plans to replace your hardware before you

I have not found any indication of hardware errors at this point.

#grep -i err messages-20241212 | grep sda
(nothing)
#grep -i err messages-20241212 | grep nvme
(nothing)

Some "smart" values are posted above. Nothing suspicious whatsoever.


Thank you!

Regards,

Nick

suffer more serious data loss.)

Finally, if you want more support than what volunteers in the upstream
linux kernel community can provide, this is what paid support from
companies like SuSE, or Red Hat, can provide.

Cheers,

							- Ted




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux